Back to Search Start Over

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Authors :
Zhou, Yi
Yang, Junjie
Zhang, Huishuai
Liang, Yingbin
Tarokh, Vahid
Publication Year :
2019
Publisher :
arXiv, 2019.

Abstract

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.<br />Comment: ICLR2019

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....9f1f0d9278c6fe6c7e944dddcbe1c292
Full Text :
https://doi.org/10.48550/arxiv.1901.00451