1. Learning Multi-Modal Scale-Aware Attentions for Efficient and Robust Road Segmentation
- Author
-
Zhou, Yunjiao, Yang, Jianfei, Cao, Haozhi, Zeng, Zhaoyang, Zou, Han, and Xie, Lihua
- Abstract
Road segmentation is essential to unmanned systems, contributing to road perception and navigation in the field of autonomous driving. While multi-modal road segmentation methods have shown promising results by leveraging the complementary data of RGB and Depth to provide robust 3D geometry information, existing methods suffer from severe efficiency problems that hinder their practical application in autonomous driving. Their direct concatenation of multi-modal features with a densely-connected network leads to increased semantic gaps among modalities and scales, causing high computational and time complexity. To address these issues, we propose a Multi-modal Scale-aware Attention Network (MSAN) to fuse RGB and Depth data effectively via a novel transformer-based cross-attention module, namely Multi-modal Scare-aware Transformer (MST), which fuses RGB-D features from a global perspective across multiple scales. To better consolidate different scales of features, we further propose a Scale-aware Attention Module (SAM) that captures channel-wise attention efficiently for cross-scale fusion. These two attention-based modules explore the complementarity of modalities and scales, narrowing the gaps and avoiding complex structures for road segmentation. Extensive experiments demonstrate MSAN achieves competitive performance at a low computational cost, suitable for real-time implementation on edge-devices in autonomous driving systems.
- Published
- 2024
- Full Text
- View/download PDF