201. A run-time optimization approach for reducing data movements using locality-aware searching
- Author
-
Endong Wang, Liang Li, Xiaoshe Dong, Xingjun Zhang, Tao Ju, and Yan Kang
- Subjects
Computer science ,Locality ,Parallel computing ,Reuse ,computer.software_genre ,Bottleneck ,Theoretical Computer Science ,Hardware and Architecture ,Compiler ,General-purpose computing on graphics processing units ,Performance improvement ,computer ,Software ,Information Systems ,Abstraction (linguistics) - Abstract
The CPU---GPU communication bottleneck limits the performance improvement of GPU applications in heterogeneous GPGPU systems and usually is handled by data reuse optimization. This paper analyzes data reuse through DAG abstraction and obtains rules showing that the run-time data reuse optimization can effectively relieve the bottleneck. Based on the rules, this paper proposes a run-time optimization framework for data reuse, called R-Tracker. The R-Tracker uses locality-aware searching approach to handle reuses. It can not only low costly implement the data reuse optimization but also effectively implement the searching, the data transfers, and the GPU computation concurrently. R-Tracker relaxes the constraints that are required in compiler-based approaches and thus achieves better reuse effect. The experimental results show that R-Tracker improves the performance by 1.77---16.42 % over compiler-based approach OpenMPC and 1.40---8.39 % over CGCM in single-node execution, and 48.78---60 % over CGCM in multi-node execution.
- Published
- 2014