Start Over

MVTr: multi-feature voxel transformer for 3D object detection.

Authors :: Ai, Lingmei
Xie, Zhuoyu
Yao, Ruoxia
Yang, Mengyao
Source :: Visual Computer; Mar2024, Vol. 40 Issue 3, p1453-1466, 14p
Publication Year :: 2024
Abstract: Convolutional neural networks have become a powerful tool for partial 3D object detection. However, their power has not been fully realized for focusing on global information, which is crucial for object detection. In this paper, we resolve the problem with a multi-feature voxel transformer (MVTr), an architecture that extracts long-range relationship features through self-attention between multi-feature voxels. In general, converting a point cloud to a voxel representation can reduce a lot of computation, but it would take a long process for the attention network to pay attention to the car voxels in a huge 3D real scene. To this end, we propose a semantic voxel module which takes semantic voxels as input and cooperates with a sparse and a non-empty voxel module to extract features. And the semantic voxels are generated from image segmentation and point cloud projection, which only retains a large number of car voxels. To further enlarge the attention range while maintaining a favorable computational, we propose two attention mechanisms for multi-head attention: local attention and stumpy attention. Finally, we propose the fusion attention module, which can add channel attention and spatial attention to the 2D backbone network. MVTr combines the semantic information of the image and the 3D information of the point cloud and can be applied to most 3D object detection tasks. Experimental results on KITTI dataset show that our method is effective, and the precision has significant advantages compared to other similar feature fusion-based methods. [ABSTRACT FROM AUTHOR]