Start Over

Token Pooling in Vision Transformers

Authors :: Marin, Dmitrii
Chang, Jen-Hao Rick
Ranjan, Anurag
Prabhu, Anish
Rastegari, Mohammad
Tuzel, Oncel
Source :: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023
Publication Year :: 2021
Abstract: Despite the recent success in many applications, the high computational requirements of vision transformers limit their use in resource-constrained settings. While many existing methods improve the quadratic complexity of attention, in most vision transformers, self-attention is not the major computation bottleneck, e.g., more than 80% of the computation is spent on fully-connected layers. To improve the computational complexity of all layers, we propose a novel token downsampling method, called Token Pooling, efficiently exploiting redundancies in the images and intermediate token representations. We show that, under mild assumptions, softmax-attention acts as a high-dimensional low-pass (smoothing) filter. Thus, its output contains redundancy that can be pruned to achieve a better trade-off between the computational cost and accuracy. Our new technique accurately approximates a set of tokens by minimizing the reconstruction error caused by downsampling. We solve this optimization problem via cost-efficient clustering. We rigorously analyze and compare to prior downsampling methods. Our experiments show that Token Pooling significantly improves the cost-accuracy trade-off over the state-of-the-art downsampling. Token Pooling is a simple and effective operator that can benefit many architectures. Applied to DeiT, it achieves the same ImageNet top-1 accuracy using 42% fewer computations.

Subjects :: Computer Science - Computer Vision and Pattern Recognition
Computer Science - Machine Learning

Details

Database :: arXiv
Journal :: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023
Publication Type :: Report
Accession number :: edsarx.2110.03860
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Token Pooling in Vision Transformers

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Token Pooling in Vision Transformers

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources