Back to Search Start Over

SGD: General Analysis and Improved Rates

Authors :
Gower, Robert
Loizou, Nicolas
Qian, Xun
Sailanbayev, Alibek
Shulgin, Egor
Richtárik, Peter
Statistical Machine Learning and Parsimony (SIERRA)
Département d'informatique - ENS Paris (DI-ENS)
Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris
Institut National de Recherche en Informatique et en Automatique (Inria)
Signal, Statistique et Apprentissage (S2A)
Laboratoire Traitement et Communication de l'Information (LTCI)
Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris
Département Images, Données, Signal (IDS)
Télécom ParisTech
University of Edinburgh
King Abdullah University of Science and Technology (KAUST)
School of Science and Technology [Kazakhstan]
Nazarbayev University [Kazakhstan]
Moscow Institute of Physics and Technology [Moscow] (MIPT)
School of Mathematics - University of Edinburgh
Robert M. Gower acknowledges the support by a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in a joint call with Gaspard Monge Program for optimization, operations research and their interactions with data sciences.
Département d'informatique de l'École normale supérieure (DI-ENS)
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris
École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Source :
International Conference on Machine Learning, International Conference on Machine Learning, Jun 2019, Los Angeles, United States
Publication Year :
2019
Publisher :
HAL CCSD, 2019.

Abstract

We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size. With this we can also determine the mini-batch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal mini-batch size. For zero variance, the optimal mini-batch size is one. Moreover, we prove insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime.<br />23 pages, 6 figures

Details

Language :
English
Database :
OpenAIRE
Journal :
International Conference on Machine Learning, International Conference on Machine Learning, Jun 2019, Los Angeles, United States
Accession number :
edsair.doi.dedup.....ca1d05d1495c525712450bed8aa287db