Enabling Independent Communication for FPGAs in High Performance Computing

  • Joshua Lant

Student thesis: Phd


The landscape of High Performance Computing is changing, with increasing heterogeneity, new data-intensive workloads and ever tighter system power constraints. Given these changes there has been increased interest in the deployment of FPGA technology within HPC systems. Traditionally FPGAs have been of limited use to the HPC community. However, there have been many architectural advances in recent years; hardened floating-point operators and on-die CPUs, greater on-chip memory capacity, increased off-chip memory bandwidth but to name a few. These advances have brought the opportunity to more readily exploit the FPGA’s efficiency and flexibility in HPC. Unfortunately there are still a number of research problems to be solved in order to allow this to happen. In this thesis we tackle one such problem; regarding the interconnect and its relation to the system architecture. The interconnect must have several key properties in order to satisfy the demands of large, data-intensive applications and take advantage of dataflow processing for FPGA based HPC. It must (i) allow for tight coupling between FPGA and system memory in both local and remote nodes. This is required to enhance the performance of a number of key workloads which exhibit irregular memory access patterns. (ii) It must allow for the FPGA to issue and process network transactions without any CPU intervention. This is required for high performance inter-FPGA communication and independent scaling (disaggregation) of the FPGA resources. (iii) The interconnect must maintain its key properties of scalability and reliability; required for HPC systems but at odds with the other primary requirements. In this thesis we present a novel Network Interface solution implemented entirely within the fabric of the FPGA, which attempts to address all of these competing factors. The Network Interface allows for a system architecture which better supports distributed FPGA processing within a shared, global address space. It provides hardware primitives to support RDMA and shared-memory transfers over a lightweight, custom network protocol. It allows for direct inter-FPGA communication without any CPU intervention; supported via a hardware-offloaded, reliable and connectionless transport layer. The microarchitecture of the Network Interface and transport layer are detailed, as well as a number of performance enhancements which reduce the latency and increase the achievable throughput of the system. We assess the consistency issues and network errors which can occur, and show how the Network Interface is able to support Out-Of-Order packets from the network. In the latter part of the thesis we show the benefits of direct inter-FPGA communication for dataflow processing when compared with a software based transport, and demonstrate how we can estimate the expected performance of such a system for network-bound processing.
Date of Award1 Aug 2020
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorMikel Luján (Supervisor) & Javier Navaridas (Supervisor)


  • Reliability
  • Computer Architecture
  • Distributed Computing
  • Transport
  • FPGA
  • HPC
  • High Performance Computing
  • Interconnect
  • Network

Cite this