Back to Search Start Over

Cross-scale cascade transformer for multimodal human action recognition.

Authors :
Liu, Zhen
Cheng, Qin
Song, Chengqun
Cheng, Jun
Source :
Pattern Recognition Letters. Apr2023, Vol. 168, p17-23. 7p.
Publication Year :
2023

Abstract

• A cross-modal and cross-scale fusion module is proposed to perform multimodal feature interaction. • The proposed fusion network can handle different multimodal input combinations and obtain significant performance improvement. • Visualization of multimodal features shows the complementary information learned by the fusion network. • Comparisons with state-of-the-art methods on public benchmarks show the superiority of the proposed method. Human action recognition can benefit from multimodal information to address the classification problem under complex situations. However, existing works either use score fusion or perform simple feature integration methods to combine multiple heterogeneous modalities which failed to effectively utilize multimodal complementary information. In this paper, we proposed a Cross-Scale Cascade Multimodal Fusion Transformer (CSCMFT) to perform interaction and fusion among modalities of multi-scale features, thus obtaining a multimodal complementary representation for RGB-D-based human action recognition. Cross-Modal Cross-Scale Mixer (CCM) is the basic component in CSCMFT, which captures cross-modal relations and propagates the fused information across scales. Furthermore, our CSCMFT can still achieve significant improvements when applied to different multimodal combinations, indicating its generality and scalability. Experimental results show that CSCMFT fully exploits complementary semantic information between RGB and depth maps and outperforms state-of-the-art RGB-D-based methods on NTU RGB+D 60 & 120 and PKU-MMD datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01678655
Volume :
168
Database :
Academic Search Index
Journal :
Pattern Recognition Letters
Publication Type :
Academic Journal
Accession number :
162891964
Full Text :
https://doi.org/10.1016/j.patrec.2023.02.024