1. LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
- Author
-
Chen, Qiang, Su, Xiangbo, Zhang, Xinyu, Wang, Jian, Chen, Jiahui, Shen, Yunpeng, Han, Chuchu, Chen, Ziliang, Xu, Weixiang, Li, Fanrong, Zhang, Shan, Yao, Kun, Ding, Errui, Zhang, Gang, and Wang, Jingdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for reducing the ViT encoder complexity. We improve the ViT encoder by aggregating multi-level feature maps, and the intermediate and final feature maps in the ViT encoder, forming richer feature maps, and introduce window-major feature map organization for improving the efficiency of interleaved attention computation. Experimental results demonstrate that the proposed approach is superior over existing real-time detectors, e.g., YOLO and its variants, on COCO and other benchmark datasets. Code and models are available at (https://github.com/Atten4Vis/LW-DETR).
- Published
- 2024