Back to Search Start Over

DenseGCN: A multi‐level and multi‐temporal graph convolutional network for action recognition.

Authors :
Yu, Chengzhang
Bao, Wenxia
Source :
IET Image Processing (Wiley-Blackwell). 10/16/2023, Vol. 17 Issue 12, p3401-3410. 10p.
Publication Year :
2023

Abstract

With the exponential growth of video data, action recognition has become an increasingly important area of study. Despite various advancements, achieving a balance between detection accuracy and lightness remains a formidable challenge, primarily due to the complexity of existing action recognition models. To address this issue, DenseGCN is developed, a lightweight network designed to optimize accuracy and efficiency. The aim was to create a detection model that has high accuracy while remaining lightweight for real‐world applications. DenseGCN operates via a unique three‐level feature fusion system. The initial stage involves the Multi‐level Fusion Network (MlFN), which contains dense connections and a Spatial‐Temporal Fusion Attention module (STF‐Att), designed to eliminate bias in feature extraction caused by deep networks. In the next stage, RefineBone tackles optimization issues in low‐dimensional feature layers by leveraging high‐dimensional feature layers, thus avoiding gradient stacking. Finally, the Multi‐temporal Fusion Feature Pyramid Network (MF‐FPN) generates a discriminative classification feature map by repetitively combining data from multiple dimensions. This strategy has proven successful in refining the extracted feature, allowing for discriminative feature extraction even with a reduced number of channels. This efficient design not only contributes to further research in developing lightweight networks but also offers enhanced possibilities for real‐world implementations. In two large‐scale datasets, NTU RGB+D 60 and 120, DenseGCN outperformed other state‐of‐the‐art methods, achieving an accuracy of 92.7% on the X‐View benchmark of the NTU RGB+D 60 dataset. The DenseGCN is 10.2 × faster and 10 × smaller than the spatial temporal graph attention network (STGAT) proposed in 2022 while retaining very competitive accuracy. The findings suggest that this model significantly improves the quality of feature extraction. As a result, DenseGCN presents a remarkable balance between accuracy and lightness. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
17519659
Volume :
17
Issue :
12
Database :
Academic Search Index
Journal :
IET Image Processing (Wiley-Blackwell)
Publication Type :
Academic Journal
Accession number :
172804846
Full Text :
https://doi.org/10.1049/ipr2.12872