1. Learning Gaze Transition for Gaze Target Detection in Video.
- Author
-
YANG Xingming, SHI Junbiao, LI Ziqiang, WU Kewei, and XIE Zhao
- Subjects
GAZE ,LEARNING - Abstract
Gaze target detection in the video aims to localize the gaze target in each video frame. The person gazes at different targets at different times. In the transition segment from one gaze target to gaze at another, the person may not gaze at a specific target. The gaze target detection method with an image transformer neglects to consider the temporal transition segment. The gaze direction in the transition segment may hinder the gaze target detection in the video. For gaze target detection in video, this paper proposes a gaze transition-based model, which contains a gaze direction guidance module, and a gaze transition temporal fusion module. In the gaze direction guidance module, the position of the gaze target is used to learn the heatmap of the gaze direction. The gaze target is detected by guiding with the heatmap of the gaze direction, which can suppress the target out of the gaze direction and predict the accurate position of the gaze target. In the gaze transition temporal fusion module, the heatmap in multiple frames forms the spatial-temporal heatmap. To learn the changes in the spatial-temporal heatmap, this paper uses bi-directional spatial-temporal convolution long short-term memory (LSTM), which can extract the memory-based spatial-temporal heatmap. The gaze transition is described by introducing the Gaussian-based temporal model. To localize the temporal segment of the gaze transition with uncertainty temporal length, this paper designs a Gaussian-based temporal fusion method, which can estimate the gaze transition with the start timestamp, the end timestamp, and the temporal length. By localizing the gaze transition segment, the transition effect can be removed for gaze target detection. Gaze transition- based model is trained with gaze direction- based loss, gaze target existence loss, gaze target heatmap loss, and gaze transition temporal localization loss. In the GazeFollow dataset and VideoAttentionTarget dataset, the experimental results show that the gaze transition-based model outperforms the image transformer-based model for gaze target detection in video. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF