BEV-CFKT: A LiDAR-camera cross-modality-interaction fusion and knowledge transfer framework with transformer for BEV 3D object detection.

Authors :: Wei, Ming
Li, Jiachen
Kang, Hongyi
Huang, Yijie
Lu, Jun-Guo
Source :: Neurocomputing. May2024, Vol. 582, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: The BEV-CFKT proposed in this paper leverages knowledge transfer through transformers for LiDAR-Camera fusion in the Bird's-Eye-View (BEV) space, aiming to achieve accurate and robust 3D object detection. BEV-CFKT comprises three main components, which include the generation of BEV features from images and point clouds, cross-modality interaction, and hybrid object queries using a monocular detection head. By unifying features from both point clouds and images into the BEV space, we simplify modal interaction, facilitate knowledge transfer, and extract richer structural and semantic information from multimodal data. This effectively enhances the network's performance. To further improve detection performance, BEV-CFKT incorporates a temporal fusion module. Additionally, a hybrid object queries module based on a monocular detection head accelerates the convergence of our model. We demonstrate the effectiveness of our approach through an extensive set of experiments. [ABSTRACT FROM AUTHOR]

Subjects :: *OBJECT recognition (Computer vision)
*TRANSFORMER models
*KNOWLEDGE transfer
*POINT cloud
*NETWORK performance
*MONOCULARS
*IMAGE fusion

Full Text Access

Tools