Start Over

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

Authors :: Li, Shigang
Ben-Nun, Tal
Nadiradze, Giorgi
Girolamo, Salvatore Di
Dryden, Nikoli
Alistarh, Dan
Hoefler, Torsten
Source :: IEEE Transactions on Parallel & Distributed Systems; Jul2021, Vol. 32 Issue 7, p1725-1739, 15p
Publication Year :: 2021
Abstract: Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iterations to achieve the same accuracy as their globally-communicating counterparts. We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global communication via subgroup weight exchange. The key insight is a combination of algorithmic changes to the averaging scheme and the use of a group allreduce operation. We prove the convergence of WAGMA-SGD, and empirically show that it retains convergence rates similar to Allreduce-SGD. For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale. Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput (e.g., 2.1× on 1,024 GPUs for reinforcement learning), and achieves the fastest time-to-solution (e.g., the highest score using the shortest training time for Transformer). [ABSTRACT FROM AUTHOR]

Subjects :: MACHINE translating
DEEP learning
INTERNATIONAL communication
INFORMATION dissemination
REINFORCEMENT learning
TASK analysis
GRAPHICS processing units

Details

Language :: English
ISSN :: 10459219
Volume :: 32
Issue :: 7
Database :: Complementary Index
Journal :: IEEE Transactions on Parallel & Distributed Systems
Publication Type :: Academic Journal
Accession number :: 148970893
Full Text :: https://doi.org/10.1109/TPDS.2020.3040606

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources