Back to Search Start Over

High-Performance Tensor Contractions for GPUs.

Authors :
Abdelfattah, Ahmad
Baboulin, Marc
Dobrev, Veselin
Dongarra, Jack
Earl, Christopher
Falcou, Joel
Haidar, Azzam
Karlin, Ian
Kolev, Tzanio
Masliah, Ian
Tomov, Stanimire
Source :
Procedia Computer Science; 2016, Vol. 80, p108-118, 11p
Publication Year :
2016

Abstract

We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, we demonstrate close to peak performance results. In particular, to accelerate large scale tensor-formulated high-order finite element method (FEM) simulations, which is the main focus and motivation for this work, we represent contractions as tensor index reordering plus matrix-matrix multiplications (GEMMs). This is a key factor to achieve algorithmically many-fold acceleration (vs. not using it) due to possible reuse of data loaded in fast memory. In addition to using this context knowledge, we design tensor data-structures, tensor algebra interfaces, and new tensor contraction algorithms and implementations to achieve 90+% of a theoretically derived peak on GPUs. On a K40c GPU for contractions resulting in GEMMs on square matrices of size 8 for example, we are 2.8× faster than CUBLAS, and 8.5× faster than MKL on 16 cores of Intel Xeon ES-2670 (Sandy Bridge) 2.60GHz CPUs. Finally, we apply autotuning and code generation techniques to simplify tuning and provide an architecture-aware, user-friendly interface. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18770509
Volume :
80
Database :
Supplemental Index
Journal :
Procedia Computer Science
Publication Type :
Academic Journal
Accession number :
115845000
Full Text :
https://doi.org/10.1016/j.procs.2016.05.302