Start Over

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Authors :: Wortsman, Mitchell
Ilharco, Gabriel
Gadre, Samir Yitzhak
Roelofs, Rebecca
Gontijo-Lopes, Raphael
Morcos, Ari S.
Namkoong, Hongseok
Farhadi, Ali
Carmon, Yair
Kornblith, Simon
Schmidt, Ludwig
Publication Year :: 2022
Abstract: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at https://github.com/mlfoundations/model-soups.<br />Comment: ICML 2022. The last three authors contributed equally

Subjects :: Computer Science - Machine Learning
Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2203.05482
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources