Back to Search Start Over

CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs.

Authors :
Xie, Xiaolong
Liang, Yun
Li, Xiuhong
Wu, Yudong
Sun, Guangyu
Wang, Tao
Fan, Dongrui
Source :
IEEE Transactions on Computers; Jun2018, Vol. 67 Issue 6, p890-897, 8p
Publication Year :
2018

Abstract

The key to the high performance on GPUs lies in the massive threading to enable thread switching and hide long latencies. GPUs are equipped with a large register file to enable fast context switch. However, thread throttling techniques that are designed to mitigate cache contention, lead to under-utilization of registers. Register allocation is a significant factor for performance as it not just determines the single-thread performance, but indirectly affects the TLP. In this paper, we propose Coordinated Register Allocation and Thread-level parallelism (CRAT ) to explore the optimization space of register allocation and TLP management on GPUs. CRAT employs both compile-time(CRAT-static) and run-time techniques(CRAT-dyn) to exhaust the design space. CRAT-static works statically to explore TLP and register allocation trade-off and CRAT-dyn exploits dynamic register allocation for further improvement. Experiments indicate that CRAT-static achieves an average 1.25X speedup over existing TLP management technique. On four register-limited applications, CRAT-dyn further improves the performance speedup of CRAT-static from 1.51X to 1.70X. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
00189340
Volume :
67
Issue :
6
Database :
Complementary Index
Journal :
IEEE Transactions on Computers
Publication Type :
Academic Journal
Accession number :
129614861
Full Text :
https://doi.org/10.1109/TC.2017.2776272