The ever-increasing demand for high-performance Big Data analytics and data processing has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into modern Big Data platforms. Currently, this integration comes at the cost of programmability, as the end-user Application Programming Interface (API) of Big Data frameworks must be altered in order to access the underlying heterogeneous hardware. In some cases, it is even required by developers to provide their application code in a low-level programming language that targets specific hardware accelerators (e.g., CUDA, OpenCL, etc.). The purpose of this thesis is to identify the current barriers in the automatic acceleration of Big Data applications and to propose techniques that can lift the emerged restrictions. Specifically, this thesis presents the first Big Data platform that can dynamically take advantage of GPUs and FPGAs for the acceleration of unmodified applications in a completely agnostic manner to the user. This novel heterogeneous platform has been prototyped in the context of Apache Flink, a widely used Big Data platform, and TornadoVM, an open-source framework that automatically compiles and executes Java applications on GPUs, FPGAs, and multi-core CPUs. The techniques that will be presented are not bound to the frameworks used, and can also be applied to other software platforms with slight modifications. The performance evaluation of the proposed solution has been conducted on both standard benchmarks and industrial use cases, showcasing performance speedups of up to 65x on GPUs and 184x on FPGAs, against vanilla Apache Flink running on traditional multi-core CPUs.
|Date of Award||31 Dec 2022|
- The University of Manchester
|Supervisor||Mikel Luján (Supervisor) & Christos-Efthymios Kotselidis (Supervisor)|
- big data frameworks