Back to Search
Start Over
Efficiently solving tri-diagonal system by chunked cyclic reduction and single-GPU shared memory
- Source :
- The Journal of Supercomputing. 71:369-390
- Publication Year :
- 2014
- Publisher :
- Springer Science and Business Media LLC, 2014.
-
Abstract
- The tri-diagonal system comes from dynamic problems such as fluid simulation, and high efficiency is important for the success of these applications. In this paper, we develop completely GPU shared memory-based chunked cyclic reduction under the constraint of the capacity of the shared memory. Computational results show that GPU shared memory chunked cyclic reduction exhibits high efficiency by Nvidia TITAN with 48k shared memory, and GPU shared memory chunked cyclic reduction can solve a tri-diagonal system with 262,144-by-262,144 coefficient matrix in 1.768 ms. Computational results also show that GPU shared memory chunked cyclic reduction scales well to the sizes of coefficient matrix and the reduced systems. Altogether, since building completely on GPU shared memory, our solver may be faster than existing GPU solvers because of the efficiency of GPU shared memory, though the solubility of our solver is smaller than existing GPU solvers because of the capacity constraint of shared memory, where solubility means the solvable tri-diagonal system with the maximum size of the coefficient matrix by our solver.
- Subjects :
- Computer science
Parallel algorithm
Uniform memory access
Parallel computing
Solver
Theoretical Computer Science
Computer Science::Performance
Shared memory
Hardware and Architecture
Computer Science::Mathematical Software
General-purpose computing on graphics processing units
Coefficient matrix
Software
Information Systems
Cyclic reduction
Subjects
Details
- ISSN :
- 15730484 and 09208542
- Volume :
- 71
- Database :
- OpenAIRE
- Journal :
- The Journal of Supercomputing
- Accession number :
- edsair.doi...........03041576c217b17ba0da0f2e1e369fc9
- Full Text :
- https://doi.org/10.1007/s11227-014-1299-2