Start Over

Spatial–temporal pooling for action recognition in videos.

Authors :: Wang, Jiaming
Shao, Zhenfeng
Huang, Xiao
Lu, Tao
Zhang, Ruiqian
Lv, Xianwei
Source :: Neurocomputing. Sep2021, Vol. 451, p265-278. 14p.
Publication Year :: 2021
Abstract: • We propose an end-to-end approach with a novel temporal-spatial pooling block (named STP) for action classification, which can learn pool discriminative frames and pixels in a certain clip. Our method achieves better performance than other state-of-the-art methods. • We propose a STP loss function, aiming to learn a sparse importance score in the temporal dimension, abandoning the redundant or invalid frames. • We present a ferryboat video database (named Ferryboat-4) for ferry action recognition. The database includes four action categories: Inshore, Offshore, Traffic, and Negative. We evaluate proposed STP and other state-of-the-art models on this database. Recently, deep convolutional neural networks have demonstrated great effectiveness in action recognition with both RGB and optical flow in the past decade. However, existing studies generally treat all frames and pixels equally, potentially leading to poor robustness of models. In this paper, we propose a novel parameter-free spatial–temporal pooling block (referred to as STP) for action recognition in videos to address this challenge. STP is proposed to learn spatial and temporal weights, which are further used to guide information compression. Different from other temporal pooling layers, STP is more efficient as it discards the non-informative frames in a certain clip. In addition, STP applies a novel loss function that forces the model to learn information from sparse and discriminative frames. Moreover, we introduce a dataset for ferry action classification, named Ferryboat-4 , which includes four categories: Inshore , Offshore , Traffic , and Negative. This designed dataset can be used for the identification of ferries with abnormal behaviors, providing the essential information to support the supervision, management, and monitoring of ships. All the videos are acquired via real-world cameras. We perform extensive experiments on publicly available datasets as well as Ferryboat-4 and find that the proposed method outperforms several state-of-the-art methods in action classification. Source code and datasets are available at https://github.com/jiaming-wang/STP. [ABSTRACT FROM AUTHOR]