Back to Search Start Over

FLSGD: free local SGD with parallel synchronization.

Authors :
Ye, Qing
Zhou, Yuhao
Shi, Mingjia
Lv, Jiancheng
Source :
Journal of Supercomputing. Jul2022, Vol. 78 Issue 10, p12410-12433. 24p.
Publication Year :
2022

Abstract

Synchronous parameters algorithms with data parallelism have been successfully utilized to accelerate the distributed training of deep neural networks (DNNs). However, a prevalent shortcoming of the synchronous methods is computation waste resulted from the mutual waiting among the computational workers with different performance and the communication delays at each synchronization. To alleviate this drawback, we propose a novel method, free local stochastic gradient descent (FLSGD) with parallel synchronization, to eliminate the waiting and communication overhead. Specifically, the process of distributed DNN training is firstly modeled as a pipeline which assembly consists of three components: dataset partition, local SGD, and parameter updating. Then, a novel adaptive batch size and dataset partition method based on the computational performance of the node is employed to eliminate the waiting time by keeping the load balance of the distributed DNN training. The local SGD and the parameter updating including gradients synchronization are parallelized to eliminate the communication cost by one-step gradient delaying, and the stale problem is remedied by an appropriate approximation. To our best knowledge, this is the first work focusing on decreasing both distributed training load balancing and communication overhead Extensive experiments are conducted with four state-of-the-art DNN models on two image classification datasets (i.e., CIFAR10 and CIFAR100) to demonstrate that the effectiveness of FLSGD outperforms the synchronous methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09208542
Volume :
78
Issue :
10
Database :
Academic Search Index
Journal :
Journal of Supercomputing
Publication Type :
Academic Journal
Accession number :
157415832
Full Text :
https://doi.org/10.1007/s11227-021-04267-5