Back to Search
Start Over
StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM.
- Source :
-
IEEE Transactions on Computers . Jun2018, Vol. 67 Issue 6, p861-873. 13p. - Publication Year :
- 2018
-
Abstract
- GPU has become popular with a large amount of parallelism found in learning. While the GPU has been effective for many learning tasks, still many GPU learning applications have low execution efficiency due to sparse data. Sparse data induces divergent memory accesses with low locality, thereby consuming a large fraction of execution time transferring data across the memory hierarchy. Although a considerable effort has been devoted to reducing the memory divergence, iterative-convergent learning provides a unique opportunity to achieve full potential in modern GPUs that it allows different threads to continue computation using stale values. In this paper, we propose StaleLearn, a learning acceleration mechanism to reduce the memory divergence overhead of GPU learning by utilizing the stale value tolerance of the iterative-convergent learning. Based on the stale value tolerance, StaleLearn transforms the problem of divergent memory accesses into the synchronization problem by replicating the model and reduces the synchronization overhead by asynchronous synchronization on Processor-in-Memory (PIM). The stale value tolerance enables a clear task decomposition between the GPU and PIM, which can effectively exploit parallelism between PIM and GPU. On average, our approach accelerates representative GPU learning applications by 3.17 times with existing PIM proposals. [ABSTRACT FROM PUBLISHER]
Details
- Language :
- English
- ISSN :
- 00189340
- Volume :
- 67
- Issue :
- 6
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Computers
- Publication Type :
- Academic Journal
- Accession number :
- 129614865
- Full Text :
- https://doi.org/10.1109/TC.2017.2780237