Back to Search Start Over

StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM.

Authors :
Lee, Joo Hwan
Kim, Hyesoon
Source :
IEEE Transactions on Computers. Jun2018, Vol. 67 Issue 6, p861-873. 13p.
Publication Year :
2018

Abstract

GPU has become popular with a large amount of parallelism found in learning. While the GPU has been effective for many learning tasks, still many GPU learning applications have low execution efficiency due to sparse data. Sparse data induces divergent memory accesses with low locality, thereby consuming a large fraction of execution time transferring data across the memory hierarchy. Although a considerable effort has been devoted to reducing the memory divergence, iterative-convergent learning provides a unique opportunity to achieve full potential in modern GPUs that it allows different threads to continue computation using stale values. In this paper, we propose StaleLearn, a learning acceleration mechanism to reduce the memory divergence overhead of GPU learning by utilizing the stale value tolerance of the iterative-convergent learning. Based on the stale value tolerance, StaleLearn transforms the problem of divergent memory accesses into the synchronization problem by replicating the model and reduces the synchronization overhead by asynchronous synchronization on Processor-in-Memory (PIM). The stale value tolerance enables a clear task decomposition between the GPU and PIM, which can effectively exploit parallelism between PIM and GPU. On average, our approach accelerates representative GPU learning applications by 3.17 times with existing PIM proposals. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
00189340
Volume :
67
Issue :
6
Database :
Academic Search Index
Journal :
IEEE Transactions on Computers
Publication Type :
Academic Journal
Accession number :
129614865
Full Text :
https://doi.org/10.1109/TC.2017.2780237