Back to Search
Start Over
A Communication Efficient ADMM-based Distributed Algorithm Using Two-Dimensional Torus Grouping AllReduce
- Source :
- Data Science and Engineering, Vol 8, Iss 1, Pp 61-72 (2023)
- Publication Year :
- 2023
- Publisher :
- SpringerOpen, 2023.
-
Abstract
- Abstract Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a grouping AllReduce method based on the two-dimensional torus topology. This method synchronizes the model parameters by grouping and makes full use of bandwidth. Secondly, we propose a distributed algorithm, 2D-TGA-ADMM, which combines the 2D-TGA with the alternating direction method of multipliers (ADMM). It focuses on sub-model training and reduces the wait time among workers in the synchronization process. Finally, experimental results on the Tianhe-2 supercomputing platform show that compared with the $${\mathtt {MPI\_Allreduce}}$$ MPI _ Allreduce , the 2D-TGA could shorten the synchronization wait time by $$33\%$$ 33 % .
Details
- Language :
- English
- ISSN :
- 23641185 and 23641541
- Volume :
- 8
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- Data Science and Engineering
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.326f786c96534dbf82faf8489cae1202
- Document Type :
- article
- Full Text :
- https://doi.org/10.1007/s41019-022-00202-7