Back to Search Start Over

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

Authors :
Chen, Qiang
Su, Xiangbo
Zhang, Xinyu
Wang, Jian
Chen, Jiahui
Shen, Yunpeng
Han, Chuchu
Chen, Ziliang
Xu, Weixiang
Li, Fanrong
Zhang, Shan
Yao, Kun
Ding, Errui
Zhang, Gang
Wang, Jingdong
Publication Year :
2024

Abstract

In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for reducing the ViT encoder complexity. We improve the ViT encoder by aggregating multi-level feature maps, and the intermediate and final feature maps in the ViT encoder, forming richer feature maps, and introduce window-major feature map organization for improving the efficiency of interleaved attention computation. Experimental results demonstrate that the proposed approach is superior over existing real-time detectors, e.g., YOLO and its variants, on COCO and other benchmark datasets. Code and models are available at (https://github.com/Atten4Vis/LW-DETR).

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2406.03459
Document Type :
Working Paper