Back to Search Start Over

An efficient motion visual learning method for video action recognition.

Authors :
Wang, Bin
Chang, Faliang
Liu, Chunsheng
Wang, Wenqian
Ma, Ruiyi
Source :
Expert Systems with Applications. Dec2024:Part B, Vol. 255, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Currently, efficient spatio-temporal information modeling is one of the key research components to solve the action recognition problem. Previous approaches focus on enhancing the backbone features individually using hierarchical structures, and unfortunately, most of them fail to achieve a better balance between the interactional adequacy of features within the structure. In this work, we propose an effective Multi-dimensional Adaptive Fusion Network (MDAF-Net), which can be embedded into the mainstream action recognition backbone in a plug-and-play manner to fully activate the transfer and representation of action features in the deep network. Specifically, our MDAF-Net contains two main components: the Adaptive Temporal Capture Module (ATCM) and the Extended Spatial and Channel Module (ESCM). The ATCM effectively suppresses the over-expression of similar features in adjacent frames and activates the expression of motion flow information. The ESCM further improves temporal modeling efficiency by extending the spatial feature perceptual field and enhancing channel attention. Extensive experiments on several challenging action recognition benchmarks, such as Something-Something V1&V2 and Kinetics-400, demonstrate that the proposed MDAF can achieve state-of-the-art and competitive performance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
255
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
178999059
Full Text :
https://doi.org/10.1016/j.eswa.2024.124596