Start Over

Trainable Weight Averaging: A General Approach for Subspace Training

Authors :: Li, Tao
Huang, Zhehao
Wu, Yingwen
He, Zhengbao
Tao, Qinghua
Huang, Xiaolin
Lin, Chih-Jen
Li, Tao
Huang, Zhehao
Wu, Yingwen
He, Zhengbao
Tao, Qinghua
Huang, Xiaolin
Lin, Chih-Jen
Publication Year :: 2022
Abstract: Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better generalization performance. Our previous work extracts the subspaces by performing the dimension reduction method over the training trajectory, which verifies that DNN could be well-trained in a tiny subspace. However, that method is inefficient for subspace extraction and numerically unstable, limiting its applicability to more general tasks. In this paper, we connect subspace training to weight averaging and propose \emph{Trainable Weight Averaging} (TWA), a general approach for subspace training. TWA is efficient in terms of subspace extraction and easy to use, making it a promising new optimizer for DNN's training. Our design also includes an efficient scheme that allows parallel training across multiple nodes to handle large-scale problems and evenly distribute the memory and computation burden to each node. TWA can be used for both efficient training and generalization enhancement, for different neural network architectures, and for various tasks from image classification and object detection, to neural language processing. The code of implementation is available at https://github.com/nblt/TWA, which includes extensive experiments covering various benchmark computer vision and neural language processing tasks with various architectures.<br />Comment: Journal version in progress. Previously accepted to ICLR 2023

Details

Database :: OAIster
Publication Type :: Electronic Resource
Accession number :: edsoai.on1333773526
Document Type :: Electronic Resource

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Trainable Weight Averaging: A General Approach for Subspace Training

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Trainable Weight Averaging: A General Approach for Subspace Training

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources