Simulating low precision floating-point arithmetic

Nicholas Higham, Srikara Pranesh

    Research output: Contribution to journalArticlepeer-review

    Abstract

    The half precision (fp16) floating-point format, defined in the 2008 revision of the IEEE standard for floating-point arithmetic, and a more recently proposed half precision format bfloat16, are increasingly available in GPUs and other accelerators. While the support for low precision arithmetic is mainly motivated by machine learning applications, general purpose numerical algorithms can benefit from it, too, gaining in speed, energy usage, and reduced communication costs. Since the appropriate hardware is not always available, and one may wish to experiment with new arithmetics not yet implemented in hardware, software simulations of low precision arithmetic are needed. We discuss how to simulate low precision arithmetic using arithmetic of higher precision. We examine the correctness of such simulations and explain via rounding error analysis why a natural method of simulation can provide results that are more accurate than actual computations at low precision. We provide a MATLAB function chop that can be used to efficiently simulate fp16, bfloat16, and other low precision arithmetics, with or without the representation of subnormal numbers and with the options of round to nearest, directed rounding, stochastic rounding, and random bit flips in the significand. We demonstrate the advantages of this approach over defining a new MATLAB class and overloading operators.
    Original languageEnglish
    JournalS I A M Journal on Scientific Computing
    Volume41
    Issue number5
    Early online date29 Oct 2019
    DOIs
    Publication statusPublished - 2019

    Keywords

    • floating-point arithmetic
    • half precision
    • low precision
    • IEEE arithmetic
    • fp16
    • bfloat16
    • subnormal numbers
    • mixed precision
    • simulation
    • Rounding error analysis
    • round to nearest
    • directed rounding
    • stochastic rounding
    • bit flips
    • MATLAB

    Fingerprint

    Dive into the research topics of 'Simulating low precision floating-point arithmetic'. Together they form a unique fingerprint.

    Cite this