51. Accelerating the general band matrix multiplication using graphics processors
- Author
-
Ernesto Dufrechou, Pablo Ezzatti, Alfredo Remón, Peter Benner, and Enrique S. Quintana-Ortí
- Subjects
Matrix (mathematics) ,Band matrix ,Computer science ,MathematicsofComputing_NUMERICALANALYSIS ,Computer Science::Mathematical Software ,Hardware acceleration ,Multiplication ,Parallel computing ,General-purpose computing on graphics processing units ,Matrix addition ,Matrix chain multiplication ,Matrix multiplication - Abstract
In this paper, we leverage the intrinsic data- parallelism of the band matrix-matrix product to accelerate this operation on Graphics Processing Units (GPUs). In par- ticular, we propose a Level-3 BLAS style algorithm to tackle the band matrix-matrix product and implement two GPU-based versions that off-load the most expensive computations —i.e., general dense matrix-matrix multiplication, triangular matrix- matrix multiplication and matrix addition— to the hardware accelerator. Results collected using GPUs for the two most recent generations of NVIDIA ("Fermi" and "Kepler") and a complete set of benchmark cases (which differ in the matrix dimensions and bandwidth) show that the GPU-enabled implementations deliver a notable reduction of the execution time.
- Published
- 2014
- Full Text
- View/download PDF