Start Over

Predicting Throughput of Distributed Stochastic Gradient Descent.

Authors :: Li, Zhuojin
Paolieri, Marco
Golubchik, Leana
Lin, Sung-Han
Yan, Wumo
Source :: IEEE Transactions on Parallel & Distributed Systems; No2022, Vol. 33 Issue 11, p2900-2912, 13p
Publication Year :: 2022
Abstract: Training jobs of deep neural networks (DNNs) can be accelerated through distributed variants of stochastic gradient descent (SGD), where multiple nodes process training examples and exchange updates. The total throughput of the nodes depends not only on their computing power, but also on their networking speeds and coordination mechanism (synchronous or asynchronous, centralized or decentralized), since communication bottlenecks and stragglers can result in sublinear scaling when additional nodes are provisioned. In this paper, we propose two classes of performance models to predict throughput of distributed SGD: fine-grained models, representing many elementary computation/communication operations and their dependencies; and coarse-grained models, where SGD steps at each node are represented as a sequence of high-level phases without parallelism between computation and communication. Using a PyTorch implementation, real-world DNN models and different cloud environments, our experimental evaluation illustrates that, while fine-grained models are more accurate and can be easily adapted to new variants of distributed SGD, coarse-grained models can provide similarly accurate predictions when augmented with ad hoc heuristics, and their parameters can be estimated with profiling information that is easier to collect. [ABSTRACT FROM AUTHOR]