Back to Search Start Over

FusionFormer: A Multi-sensory Fusion in Bird's-Eye-View and Temporal Consistent Transformer for 3D Object Detection

Authors :
Hu, Chunyong
Zheng, Hang
Li, Kun
Xu, Jianyun
Mao, Weibo
Luo, Maochun
Wang, Lingxuan
Chen, Mingxia
Peng, Qihao
Liu, Kaixuan
Zhao, Yiru
Hao, Peihan
Liu, Minzhe
Yu, Kaicheng
Publication Year :
2023

Abstract

Multi-sensor modal fusion has demonstrated strong advantages in 3D object detection tasks. However, existing methods that fuse multi-modal features require transforming features into the bird's eye view space and may lose certain information on Z-axis, thus leading to inferior performance. To this end, we propose a novel end-to-end multi-modal fusion transformer-based framework, dubbed FusionFormer, that incorporates deformable attention and residual structures within the fusion encoding module. Specifically, by developing a uniform sampling strategy, our method can easily sample from 2D image and 3D voxel features spontaneously, thus exploiting flexible adaptability and avoiding explicit transformation to the bird's eye view space during the feature concatenation process. We further implement a residual structure in our feature encoder to ensure the model's robustness in case of missing an input modality. Through extensive experiments on a popular autonomous driving benchmark dataset, nuScenes, our method achieves state-of-the-art single model performance of 72.6% mAP and 75.1% NDS in the 3D object detection task without test time augmentation.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2309.05257
Document Type :
Working Paper