Back to Search Start Over

BDNet: a method based on forward and backward convolutional networks for action recognition in videos.

Authors :
Leng, Chuanjiang
Ding, Qichuan
Wu, Chengdong
Chen, Ange
Wang, Huan
Wu, Hao
Source :
Visual Computer. Jun2024, Vol. 40 Issue 6, p4133-4147. 15p.
Publication Year :
2024

Abstract

Human action recognition analyzes the behavior in a scene according to the spatiotemporal features carried in image sequences. Existing works suffers from ineffective spatial–temporal feature learning. For short video sequence, the critical challenge is to extract informative spatiotemporal features from a limited-length video. For long video sequences, combining long-range contextual information can improve recognition performance. However, conventional methods primarily consider modeling the action's spatiotemporal features along a single direction, which is difficult to consider context information and ignores the information from the opposite direction. This article proposes a bi-directional network to simulate the bi-directional Long Short-Term Memory (Bi-LSTM) processing of time series data. Specifically, two 3D Convolutional Neural Networks (3D CNNs) extract spatiotemporal features along the forward and backward image sequence of action for each modality individually. After integrating the features of each branch, a dynamic-fusion strategy is applied to obtain a video-level prediction. We conducted comprehensive experiments on the action recognition dataset UCF101 and HMDB51 and achieved 98.0% and 81.4% recognition accuracy, respectively, with a reduction of three quarters of the inputting RGB images. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01782789
Volume :
40
Issue :
6
Database :
Academic Search Index
Journal :
Visual Computer
Publication Type :
Academic Journal
Accession number :
177714357
Full Text :
https://doi.org/10.1007/s00371-023-03073-9