Back to Search Start Over

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Authors :
Roberts, Adam
Chung, Hyung Won
Levskaya, Anselm
Mishra, Gaurav
Bradbury, James
Andor, Daniel
Narang, Sharan
Lester, Brian
Gaffney, Colin
Mohiuddin, Afroz
Hawthorne, Curtis
Lewkowycz, Aitor
Salcianu, Alex
van Zee, Marc
Austin, Jacob
Goodman, Sebastian
Soares, Livio Baldini
Hu, Haitang
Tsvyashchenko, Sasha
Chowdhery, Aakanksha
Bastings, Jasmijn
Bulian, Jannis
Garcia, Xavier
Ni, Jianmo
Chen, Andrew
Kenealy, Kathleen
Clark, Jonathan H.
Lee, Stephan
Garrette, Dan
Lee-Thorp, James
Raffel, Colin
Shazeer, Noam
Ritter, Marvin
Bosma, Maarten
Passos, Alexandre
Maitin-Shepard, Jeremy
Fiedel, Noah
Omernick, Mark
Saeta, Brennan
Sepassi, Ryan
Spiridonov, Alexander
Newlan, Joshua
Gesmundo, Andrea
Publication Year :
2022

Abstract

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2203.17189
Document Type :
Working Paper