Back to Search Start Over

Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition.

Authors :
Wang, Peng
Cao, Yuanzhouhan
Shen, Chunhua
Liu, Lingqiao
Shen, Heng Tao
Source :
IEEE Transactions on Circuits & Systems for Video Technology. Dec2017, Vol. 27 Issue 12, p2613-2622. 10p.
Publication Year :
2017

Abstract

Encouraged by the success of convolutional neural networks (CNNs) in image classification, recently much effort is spent on applying the CNNs to the video-based action recognition problems. One challenge is that a video contains a varying number of frames, which is incompatible to the standard input format of the CNNs. Existing methods handle this issue either by directly sampling a fixed number of frames or bypassing this issue by introducing a 3D convolutional layer, which conducts convolution in spatial-temporal domain. In this paper, we propose a novel network structure, which allows an arbitrary number of frames as the network input. The key to our solution is to introduce a module consisting of an encoding layer and a temporal pyramid pooling layer. The encoding layer maps the activation from the previous layers to a feature vector suitable for pooling, whereas the temporal pyramid pooling layer converts multiple frame-level activations into a fixed-length video-level representation. In addition, we adopt a feature concatenation layer that combines the appearance and motion information. Compared with the frame sampling strategy, our method avoids the risk of missing any important frames. Compared with the 3D convolutional method, which requires a huge video data set for network training, our model can be learned on a small target data set because we can leverage the off-the-shelf image-level CNN for model parameter initialization. Experiments on three challenging data sets, Hollywood2, HMDB51, and UCF101 demonstrate the effectiveness of the proposed network. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10518215
Volume :
27
Issue :
12
Database :
Academic Search Index
Journal :
IEEE Transactions on Circuits & Systems for Video Technology
Publication Type :
Academic Journal
Accession number :
126820461
Full Text :
https://doi.org/10.1109/TCSVT.2016.2576761