1. An Incremental Iterative Acceleration Architecture in Distributed Heterogeneous Environments With GPUs for Deep Learning
- Author
-
Du Lifan, Li Yang, Zhuo Tang, and Zhang Xuedong
- Subjects
020203 distributed computing ,Iterative and incremental development ,Computer science ,Iterative method ,business.industry ,Deep learning ,02 engineering and technology ,Parallel computing ,CUDA ,Data access ,Computational Theory and Mathematics ,Hardware and Architecture ,Computer cluster ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Central processing unit ,Artificial intelligence ,business ,Block (data storage) - Abstract
The parallel computing capabilities of GPUs have a significant impact on computationally intensive iterative tasks. Offloading part or all of the deep learning tasks from the CPU to the GPU for execution is mainstream. However, a large number of redundant iterative calculations exist in the iterative process of computing tasks. Therefore, we propose a GPU-based distributed incremental iterative computing architecture that can make full use of distributed parallel computing and GPU memory structure. The architecture supports deep learning and other computationally intensive iterative applications by optimizing data placement and reducing redundant iterative calculations. To support block-based data partitioning and coalesced memory access on GPUs, we propose GDataSet, an abstract data set. The GPU incremental iteration manager called GTracker is designed to be responsible for GDataSet cache management on the GPU. In order to solve the limitation of on-chip memory size, we propose a variable sliding window mechanism. It improves the hit rate of cache access and the speed of data access by realizing the best block arrangement between on-chip memory and off-chip memory. Besides, a communication channel based on an incremental iterative model is designed to support data transmission and task communication in cluster computing. Finally, we implement the proposed architecture based on Spark 2.4.1 and CUDA 10.0. Comparative experiments with widely used computationally intensive iterative applications (K-means, LSTM, etc.) show that the incremental iterative acceleration architecture can significantly improve the efficiency of iterative computing.
- Published
- 2021