Back to Search
Start Over
Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation
- Source :
- IJCAI
- Publication Year :
- 2018
- Publisher :
- International Joint Conferences on Artificial Intelligence Organization, 2018.
-
Abstract
- In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
- Accession number :
- edsair.doi...........8c878fe54a5d450eb69546192bc428a6
- Full Text :
- https://doi.org/10.24963/ijcai.2018/112