Back to Search Start Over

Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Authors :
Ebert, Nikolas
Reichardt, Laurenz
Stricker, Didier
Wasenmüller, Oliver
Publication Year :
2023

Abstract

While transformer architectures have dominated computer vision in recent years, these models cannot easily be deployed on hardware with limited resources for autonomous driving tasks that require real-time-performance. Their computational complexity and memory requirements limits their use, especially for applications with high-resolution inputs. In our work, we redesign the powerful state-of-the-art Vision Transformer PLG-ViT to a much more compact and efficient architecture that is suitable for such tasks. We identify computationally expensive blocks in the original PLG-ViT architecture and propose several redesigns aimed at reducing the number of parameters and floating-point operations. As a result of our redesign, we are able to reduce PLG-ViT in size by a factor of 5, with a moderate drop in performance. We propose two variants, optimized for the best trade-off between parameter count to runtime as well as parameter count to accuracy. With only 5 million parameters, we achieve 79.5$\%$ top-1 accuracy on the ImageNet-1K classification benchmark. Our networks demonstrate great performance on general vision benchmarks like COCO instance segmentation. In addition, we conduct a series of experiments, demonstrating the potential of our approach in solving various tasks specifically tailored to the challenges of autonomous driving and transportation.<br />Comment: This paper has been accepted at IEEE Intelligent Transportation Systems Conference (ITSC), 2023

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2307.09120
Document Type :
Working Paper