Fast and small short vector SIMD matrix multiplication kernels for the synergistic processing element of the CELL processor

Wesley Alvaro, Jakub Kurzak, Jack Dongarra

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic Processing Element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C∈=∈C∈-∈A ×B T operation and the C∈=∈C∈-∈A ×B operation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures. © 2008 Springer-Verlag Berlin Heidelberg.
    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|Lect. Notes Comput. Sci.
    PublisherSpringer Nature
    Pages935-944
    Number of pages9
    Volume5101
    ISBN (Print)3540693831, 9783540693833
    DOIs
    Publication statusPublished - 2008
    Event8th International Conference on Computational Science, ICCS 2008 - Krakow
    Duration: 1 Jul 2008 → …
    http://dblp.uni-trier.de/db/conf/iccS/iccS2008-1.html#AlvaroKD08http://dblp.uni-trier.de/rec/bibtex/conf/iccS/AlvaroKD08.xmlhttp://dblp.uni-trier.de/rec/bibtex/conf/iccS/AlvaroKD08

    Publication series

    NameLecture Notes in Computer Science

    Conference

    Conference8th International Conference on Computational Science, ICCS 2008
    CityKrakow
    Period1/07/08 → …
    Internet address

    Fingerprint

    Dive into the research topics of 'Fast and small short vector SIMD matrix multiplication kernels for the synergistic processing element of the CELL processor'. Together they form a unique fingerprint.

    Cite this