Learning attention modules for visual tracking.

Authors :: Wang, Jun
Meng, Chenchen
Deng, Chengzhi
Wang, Yuanyun
Source :: Signal, Image & Video Processing; Nov2022, Vol. 16 Issue 8, p2149-2156, 8p
Publication Year :: 2022
Abstract: Siamese networks have been widely used in visual tracking. However, it is difficult to deal with complex appearance variations when the discriminative background information is ignored and an offline training strategy is adopted. In this paper, we present a novel backbone network based on CNN model and attention mechanism in the Siamese framework. The attention mechanism is composed of a channel attention module and a spatial attention module. The channel attention module uses the learned global information to selectively focus on the convolution features, which enhances a network representation ability. Besides, the spatial attention module obtains more contextual information and semantic features of target candidates. The designed attention mechanism-based backbone is lightweight and has a real-time tracking performance. We utilize GOT-10K as a training set to offline adjust trained model parameters. The extensive experimental evaluations on OTB2015, VOT2016, VOT2018, GOT-10k and UAV123 datasets demonstrate that the proposed algorithm has excellent performances against state-of-the-art trackers. [ABSTRACT FROM AUTHOR]