Back to Search Start Over

Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation

Authors :
Yang Li
Kan Li
Xinxin Wang
Source :
IJCAI
Publication Year :
2018
Publisher :
International Joint Conferences on Artificial Intelligence Organization, 2018.

Abstract

In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.

Details

Database :
OpenAIRE
Journal :
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Accession number :
edsair.doi...........8c878fe54a5d450eb69546192bc428a6
Full Text :
https://doi.org/10.24963/ijcai.2018/112