Abstract
Fully homomorphic encryption (FHE) enables processing encrypted data without revealing sensitive information, making it applicable in fields like healthcare, finance, and legal.
Despite its benefits, FHE has high computational complexity and performance overhead.
To address this, researchers have explored hardware acceleration using Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs).
FPGAs are suitable for low-latency computations, while GPUs excel in parallel, high-throughput tasks.
However, widespread FHE adoption remains elusive due to unresolved performance issues.
This paper explores the challenges of offloading FHE computations to hardware accelerators, focusing on the OpenFHE library and the Brakerski-Gentry-Vaikuntanathan (BGV) scheme.
It is the first study on adapting this scheme for GPU acceleration.
We profile OpenFHE to identify computational bottlenecks and propose integrating parallelized CUDA computations within OpenFHE.
Our solution, tested with varying numbers of multiplicative depth, shows up to 26x performance improvement over non-accelerated implementations, proving the effectiveness of GPUs for FHE.
However, the end-to-end performance is still up to 2x slower due to the overhead of marshaling and moving data between the CPU and GPU, accounting for over 97\% of execution time.
Despite its benefits, FHE has high computational complexity and performance overhead.
To address this, researchers have explored hardware acceleration using Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs).
FPGAs are suitable for low-latency computations, while GPUs excel in parallel, high-throughput tasks.
However, widespread FHE adoption remains elusive due to unresolved performance issues.
This paper explores the challenges of offloading FHE computations to hardware accelerators, focusing on the OpenFHE library and the Brakerski-Gentry-Vaikuntanathan (BGV) scheme.
It is the first study on adapting this scheme for GPU acceleration.
We profile OpenFHE to identify computational bottlenecks and propose integrating parallelized CUDA computations within OpenFHE.
Our solution, tested with varying numbers of multiplicative depth, shows up to 26x performance improvement over non-accelerated implementations, proving the effectiveness of GPUs for FHE.
However, the end-to-end performance is still up to 2x slower due to the overhead of marshaling and moving data between the CPU and GPU, accounting for over 97\% of execution time.
Original language | English |
---|---|
Publication status | Accepted/In press - 15 Jul 2024 |
Keywords
- data privacy
- fully homomorphic encryption
- hardware acceleration
- GPUs