Start Over

CrossFormer: Cross-guided attention for multi-modal object detection.

Authors :: Lee, Seungik
Park, Jaehyeong
Park, Jinsun
Source :: Pattern Recognition Letters. Mar2024, Vol. 179, p144-150. 7p.
Publication Year :: 2024
Abstract: Object detection is one of the essential tasks in a variety of real-world applications such as autonomous driving and robotics. In a real-world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi-modal object detection model that is built upon a hierarchical transformer and cross-guidance between different modalities. The proposed hierarchical transformer consists of domain-specific feature extraction networks where intermediate features are connected by the proposed Cross-Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross-modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi-modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi-modal detection algorithms quantitatively and qualitatively. [Display omitted] • Cross-guided attention mechanism by complementary interactions between modalities. • Multi-scale attention maps generated by the proposed hierarchical transformer. • State-of-the-art performance in FLIR-aligned, LLVIP, and KAIST multispectral datasets. [ABSTRACT FROM AUTHOR]