Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation

Authors :: Yang Li
Kan Li
Xinxin Wang
Source :: IJCAI
Publication Year :: 2018
Publisher :: International Joint Conferences on Artificial Intelligence Organization, 2018.
Abstract: In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.

Subjects :: Feature aggregation
Computer science
business.industry
0502 economics and business
05 social sciences
Action recognition
Pattern recognition
Artificial intelligence
050207 economics
010501 environmental sciences
business
01 natural sciences
0105 earth and related environmental sciences

Database :: OpenAIRE
Journal :: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Accession number :: edsair.doi...........8c878fe54a5d450eb69546192bc428a6
Full Text :: https://doi.org/10.24963/ijcai.2018/112

Full Text Access

Tools