Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers

Azzam Haidar, Stanimire Tomov, Jack Dongarra, Nicholas Higham

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

546 Downloads (Pure)

Abstract

Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing
(HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax = b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16!FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4 speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained
Original languageEnglish
Title of host publicationSC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis
PublisherIEEE
Pages1-11
Volume0
DOIs
Publication statusPublished - 11 Nov 2018

Keywords

  • FP16 Arithmetic
  • Half Precision
  • Mixed Precision Solvers
  • Iterative Refinement Computation
  • GPU Computing
  • Linear algebra

Fingerprint

Dive into the research topics of 'Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers'. Together they form a unique fingerprint.

Cite this