Back to Search
Start Over
Data movement limits to frontier model training
- Publication Year :
- 2024
-
Abstract
- We present a theoretical model of distributed training, and use it to analyze how far dense and sparse training runs can be scaled. Under our baseline assumptions, given a three month training duration, data movement bottlenecks begin to significantly lower hardware utilization for training runs exceeding about $10^{28}$ FLOP, two orders of magnitude above the largest training run to date, \textbf{suggesting the arrival of fundamental barriers to scaling in three years} given recent rates of growth. A training run exceeding about $10^{31}$ FLOP is infeasible even at low utilization. However, more aggressive batch size scaling and/or shorter and fatter model shapes, if achievable, have the potential to permit much larger training runs.
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2411.01137
- Document Type :
- Working Paper