Back to Search Start Over

Comparing dynamics: deep neural networks versus glassy systems

Authors :
Marco Baity-Jesi
Mario Geiger
Gérard Ben Arous
Stefano Spigler
Yann LeCun
Matthieu Wyart
Chiara Cammarota
Levent Sagun
Giulio Biroli
Department of Physics and Astronomy [Philadelphia]
University of Pennsylvania [Philadelphia]
Institut de Physique Théorique - UMR CNRS 3681 (IPHT)
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Ecole Polytechnique Fédérale de Lausanne (EPFL)
Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS)
Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)
Institut de Mathématiques (IMA)
King‘s College London
Facebook AI Research [Paris] (FAIR)
Facebook
Service de Physique Théorique (SPhT)
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Centre National de la Recherche Scientifique (CNRS)
Laboratoire de Physique Statistique de l'ENS (LPS)
Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS)
Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Paris Diderot - Paris 7 (UPD7)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
Systèmes Désordonnés et Applications
Laboratoire de physique de l'ENS - ENS Paris (LPENS (UMR_8023))
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Université Paris Diderot - Paris 7 (UPD7)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Université Paris Diderot - Paris 7 (UPD7)
University of Pennsylvania
Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)
École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Diderot - Paris 7 (UPD7)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Paris Diderot - Paris 7 (UPD7)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Source :
Journal of Statistical Mechanics: Theory and Experiment, Journal of Statistical Mechanics: Theory and Experiment, IOP Publishing, 2019, 2019 (12), pp.124013. ⟨10.1088/1742-5468/ab3281⟩, Journal of Statistical Mechanics: Theory and Experiment, 2019, 2019 (12), pp.124013. ⟨10.1088/1742-5468/ab3281⟩, Baity-Jesi, M, Sagun, L, Geiger, M, Spigler, S, Ben Arous, G, Cammarota, C, LeCun, Y, Wyart, M & Biroli, G 2018, ' Comparing Dynamics : Deep Neural Networks versus Glassy Systems ', Proceedings of Machine Learning Research, vol. 80, pp. 314-323 . < https://arxiv.org/abs/1803.06969 >
Publication Year :
2019
Publisher :
HAL CCSD, 2019.

Abstract

We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.&lt;br /&gt;10 pages, 5 figures. Version accepted at ICML 2018

Details

Language :
English
ISSN :
17425468
Database :
OpenAIRE
Journal :
Journal of Statistical Mechanics: Theory and Experiment, Journal of Statistical Mechanics: Theory and Experiment, IOP Publishing, 2019, 2019 (12), pp.124013. ⟨10.1088/1742-5468/ab3281⟩, Journal of Statistical Mechanics: Theory and Experiment, 2019, 2019 (12), pp.124013. ⟨10.1088/1742-5468/ab3281⟩, Baity-Jesi, M, Sagun, L, Geiger, M, Spigler, S, Ben Arous, G, Cammarota, C, LeCun, Y, Wyart, M &amp; Biroli, G 2018, &#39; Comparing Dynamics : Deep Neural Networks versus Glassy Systems &#39;, Proceedings of Machine Learning Research, vol. 80, pp. 314-323 . < https://arxiv.org/abs/1803.06969 >
Accession number :
edsair.doi.dedup.....7027c0fc6ec9c7efa4c0731c2bfb57ef