Start Over

On Biased Compression for Distributed Learning

Authors :: Beznosikov, Aleksandr
Horváth, Samuel
Richtárik, Peter
Safaryan, Mher
Source :: Journal of Machine Learning Research 2023: https://www.jmlr.org/papers/v24/21-1548.html
Publication Year :: 2020
Abstract: In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.<br />Comment: 50 pages, 9 figures, 5 tables, 22 theorems and lemmas, 7 new compression operators, 1 algorithm

Subjects :: Computer Science - Machine Learning
Computer Science - Distributed, Parallel, and Cluster Computing
Mathematics - Optimization and Control
Statistics - Machine Learning

Details

Database :: arXiv
Journal :: Journal of Machine Learning Research 2023: https://www.jmlr.org/papers/v24/21-1548.html
Publication Type :: Report
Accession number :: edsarx.2002.12410
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

On Biased Compression for Distributed Learning

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

On Biased Compression for Distributed Learning

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources