Back to Search
Start Over
Pushing Memory Bandwidth Limitations Through Efficient Implementations of Block-Krylov Space Solvers on GPUs
- Source :
- Comp. Phys. Comm. 2018
- Publication Year :
- 2017
-
Abstract
- Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.<br />Comment: 15 pages, 14 figures, in press
- Subjects :
- High Energy Physics - Lattice
Physics - Computational Physics
Subjects
Details
- Database :
- arXiv
- Journal :
- Comp. Phys. Comm. 2018
- Publication Type :
- Report
- Accession number :
- edsarx.1710.09745
- Document Type :
- Working Paper
- Full Text :
- https://doi.org/10.1016/j.cpc.2018.06.019