Work

Managing Temperature and Performance of FPGAs in Heterogeneous High-Performance Computing Systems

Public

Downloadable Content

Download PDF

The integration of field-programmable gate arrays (FPGAs) into large scale computing systems is gaining attention. In these systems, real-time data handling for networking, tasks for scientific computing, and machine learning can be executed with customized datapaths on reconfigurable fabric within heterogeneous compute nodes. At the same time, high-level synthesis (HLS) tools make FPGAs accessible to a wide range of developers. On the one hand, FPGA thermal management, particularly battling the cooling cost and guaranteeing the reliability, is a continuing concern. The introduction of new heterogeneous components into HPC nodes only adds further complexities to thermal modeling and management. The thermal behavior of multi-FPGA systems deployed within large compute clusters is less explored. On the other hand, as developers use HLS tools and prioritize optimization options available to them, it still remains challenging to understand how operations are mapped to FPGAs and to reason about the achieved performance. Mechanisms need to be put in place to monitor and manage FPGAs dynamically. In this thesis, we first present a machine learning based model to capture the thermal behavior of FPGA nodes in the cluster. Two thermal management strategies guided by our temperature model are discussed and analyzed. Following this, we evaluate the performance of OpenCL-generated FPGA designs with irregular memory access patterns. We then present a software-centric multi-functional dynamic monitoring and debugging framework that can be utilized for better understanding the activities on FPGAs.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items