STA3D: Spatiotemporally attentive 3D network for video saliency prediction.

Authors :: Zou, Wenbin
Zhuo, Shengkai
Tang, Yi
Tian, Shishun
Li, Xia
Xu, Chen
Source :: Pattern Recognition Letters. Jul2021, Vol. 147, p78-84. 7p.
Publication Year :: 2021
Abstract: • Attention guiding is significant for video saliency prediction based on 3D CNN. • A spatiotemporally attentive 3D CNN for robust video saliency prediction is proposed. • An adaptive upsampling module for refining spatial features is proposed. • A frame-wise attention module for propagating temporal features is proposed. • The effectiveness of the proposed method is comprehensively evaluated. 3D fully convolutional networks (FCN), which jointly leverage the spatial and temporal cues, have achieved great success in video saliency prediction. However, they still have limitations in some challenging cases, e.g. fixation shift. To address this issue, we propose a SpatioTemporally Attentive 3D Network (STA3D) to selectively propagate the significant temporal features and refine the spatial features in 3D FCN for video saliency prediction. Extensive experiments on three standard datasets demonstrate the superiority of the proposed model against the state-of-the-art. [ABSTRACT FROM AUTHOR]