Back to Search
Start Over
CrossFormer: Cross-guided attention for multi-modal object detection.
- Source :
-
Pattern Recognition Letters . Mar2024, Vol. 179, p144-150. 7p. - Publication Year :
- 2024
-
Abstract
- Object detection is one of the essential tasks in a variety of real-world applications such as autonomous driving and robotics. In a real-world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi-modal object detection model that is built upon a hierarchical transformer and cross-guidance between different modalities. The proposed hierarchical transformer consists of domain-specific feature extraction networks where intermediate features are connected by the proposed Cross-Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross-modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi-modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi-modal detection algorithms quantitatively and qualitatively. [Display omitted] • Cross-guided attention mechanism by complementary interactions between modalities. • Multi-scale attention maps generated by the proposed hierarchical transformer. • State-of-the-art performance in FLIR-aligned, LLVIP, and KAIST multispectral datasets. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 01678655
- Volume :
- 179
- Database :
- Academic Search Index
- Journal :
- Pattern Recognition Letters
- Publication Type :
- Academic Journal
- Accession number :
- 175871154
- Full Text :
- https://doi.org/10.1016/j.patrec.2024.02.012