1. Generalization emerges from local optimization in a self-organized learning network
- Author
-
Barland, S. and Gil, L.
- Subjects
Nonlinear Sciences - Adaptation and Self-Organizing Systems ,Condensed Matter - Disordered Systems and Neural Networks ,Computer Science - Machine Learning - Abstract
We design and analyze a new paradigm for building supervised learning networks, driven only by local optimization rules without relying on a global error function. Traditional neural networks with a fixed topology are made up of identical nodes and derive their expressiveness from an appropriate adjustment of connection weights. In contrast, our network stores new knowledge in the nodes accurately and instantaneously, in the form of a lookup table. Only then is some of this information structured and incorporated into the network geometry. The training error is initially zero by construction and remains so throughout the network topology transformation phase. The latter involves a small number of local topological transformations, such as splitting or merging of nodes and adding binary connections between them. The choice of operations to be carried out is only driven by optimization of expressivity at the local scale. What we are primarily looking for in a learning network is its ability to generalize, i.e. its capacity to correctly answer questions for which it has never learned the answers. We show on numerous examples of classification tasks that the networks generated by our algorithm systematically reach such a state of perfect generalization when the number of learned examples becomes sufficiently large. We report on the dynamics of the change of state and show that it is abrupt and has the distinctive characteristics of a first order phase transition, a phenomenon already observed for traditional learning networks and known as grokking. In addition to proposing a non-potential approach for the construction of learning networks, our algorithm makes it possible to rethink the grokking transition in a new light, under which acquisition of training data and topological structuring of data are completely decoupled phenomena., Comment: This paper is submitted to Phys. Rev. X. It's a physicist's study that focus on a new paradigm for deep learning networks. We would have liked to choose other keywords for arXiv to reach a wider community, but don't have the rights to do so
- Published
- 2024