Back to Search Start Over

Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving

Authors :
Wang, Li
Zhang, Xinyu
Li, Jun
Xv, Baowei
Fu, Rong
Chen, Haifeng
Yang, Lei
Jin, Dafeng
Zhao, Lijun
Source :
IEEE Transactions on Vehicular Technology; 2023, Vol. 72 Issue: 5 p5628-5641, 14p
Publication Year :
2023

Abstract

Multi-modal fusion overcomes the inherent limitations of single-sensor perception in 3D object detection of autonomous driving. The fusion of 4D Radar and LiDAR can boost the detection range and more robust. Nevertheless, different data characteristics and noise distributions between two sensors hinder performance improvement when directly integrating them. Therefore, we are the first to propose a novel fusion method termed <inline-formula><tex-math notation="LaTeX">$M^{2}$</tex-math></inline-formula>-Fusion for 4D Radar and LiDAR, based on Multi-modal and Multi-scale fusion. To better integrate two sensors, we propose an Interaction-based Multi-Modal Fusion (IMMF) method utilizing a self-attention mechanism to learn features from each modality and exchange intermediate layer information. Specific to the current single-resolution voxel division's precision and efficiency balance problem, we also put forward a Center-based Multi-Scale Fusion (CMSF) method to first regress the center points of objects and then extract features in multiple resolutions. Furthermore, we present a data preprocessing method based on Gaussian distribution that effectively decreases data noise to reduce errors caused by point cloud divergence of 4D Radar data in the <inline-formula><tex-math notation="LaTeX">$x$</tex-math></inline-formula>-<inline-formula><tex-math notation="LaTeX">$z$</tex-math></inline-formula> plane. To evaluate the proposed fusion method, a series of experiments were conducted using the Astyx HiRes 2019 dataset, including the calibrated 4D Radar and 16-line LiDAR data. The results demonstrated that our fusion method compared favorably with state-of-the-art algorithms. When compared to PointPillars, our method achieves mAP (mean average precision) increases of 5.64<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> and 13.57<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> for 3D and BEV (bird's eye view) detection of the car class at a moderate level, respectively.

Details

Language :
English
ISSN :
00189545
Volume :
72
Issue :
5
Database :
Supplemental Index
Journal :
IEEE Transactions on Vehicular Technology
Publication Type :
Periodical
Accession number :
ejs63073785
Full Text :
https://doi.org/10.1109/TVT.2022.3230265