Back to Search Start Over

Efficiently solving tri-diagonal system by chunked cyclic reduction and single-GPU shared memory

Authors :
Jinhang Yu
Di Zhao
Source :
The Journal of Supercomputing. 71:369-390
Publication Year :
2014
Publisher :
Springer Science and Business Media LLC, 2014.

Abstract

The tri-diagonal system comes from dynamic problems such as fluid simulation, and high efficiency is important for the success of these applications. In this paper, we develop completely GPU shared memory-based chunked cyclic reduction under the constraint of the capacity of the shared memory. Computational results show that GPU shared memory chunked cyclic reduction exhibits high efficiency by Nvidia TITAN with 48k shared memory, and GPU shared memory chunked cyclic reduction can solve a tri-diagonal system with 262,144-by-262,144 coefficient matrix in 1.768 ms. Computational results also show that GPU shared memory chunked cyclic reduction scales well to the sizes of coefficient matrix and the reduced systems. Altogether, since building completely on GPU shared memory, our solver may be faster than existing GPU solvers because of the efficiency of GPU shared memory, though the solubility of our solver is smaller than existing GPU solvers because of the capacity constraint of shared memory, where solubility means the solvable tri-diagonal system with the maximum size of the coefficient matrix by our solver.

Details

ISSN :
15730484 and 09208542
Volume :
71
Database :
OpenAIRE
Journal :
The Journal of Supercomputing
Accession number :
edsair.doi...........03041576c217b17ba0da0f2e1e369fc9
Full Text :
https://doi.org/10.1007/s11227-014-1299-2