Back to Search Start Over

Transformers learn through gradual rank increase

Authors :
Boix-Adsera, Enric
Littwin, Etai
Abbe, Emmanuel
Bengio, Samy
Susskind, Joshua
Publication Year :
2023

Abstract

We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and small initialization. Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.<br />Comment: 39 pages, to appear in NeurIPS 2023

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2306.07042
Document Type :
Working Paper