Abstract
FPGAs are currently being deployed at a large scale across data-centres for various applications because of their performance and power benefits. In particular, the cloud operators have started providing FPGAs as a Service. However, to completely integrate FPGAs in a data-centre environment like standard software systems, support for fault tolerance and task migration is essential. In this paper, we propose a live migration technique for FPGA accelerators to provide support for fault tolerance, system maintenance and resource management. Our technique allows migration of OpenCL accelerators not only within a single FPGA but also across FPGAs with zero downtime. It achieves this by overlapping the computation with data-movements transparently from the user for OpenCL kernels. Moreover distributed check-pointing mechanisms can be employed to recover from unknown faults with minimal loss of completed work. Altogether it enables system updates such as changing the static FPGA configuration or upgrading the OS without loss of service.
Original language | English |
---|---|
Title of host publication | International Conference on Field-Programmable Technology (FPT) |
Place of Publication | Naha, Okinawa, Japan |
Publisher | IEEE |
Number of pages | 8 |
DOIs | |
Publication status | Published - 20 Jul 2019 |