Back to Search Start Over

First- And Third-Person Video Co-Analysis By Learning Spatial-Temporal Joint Attention

Authors :
Huangyue Yu
Yunfei Liu
Feng Lu
Minjie Cai
Source :
IEEE Transactions on Pattern Analysis and Machine Intelligence. 45:6631-6646
Publication Year :
2023
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2023.

Abstract

Recent years have witnessed a tremendous increasing of first-person videos captured by wearable devices. Such videos record information from different perspectives than the traditional third-person view, and thus show a wide range of potential usages. However, techniques for analyzing videos from different views can be fundamentally different, not to mention co-analyzing on both views to explore the shared information. In this paper, we take the challenge of cross-view video co-analysis and deliver a novel learning-based method. At the core of our method is the notion of "joint attention", indicating the shared attention regions that link the corresponding views, and eventually guide the shared representation learning across views. To this end, we propose a multi-branch deep network, which extracts cross-view joint attention and shared representation from static frames with spatial constraints, in a self-supervised and simultaneous manner. In addition, by incorporating the temporal transition model of the joint attention, we obtain spatial-temporal joint attention that can robustly capture the essential information extending through time. Our method outperforms the state-of-the-art on the standard cross-view video matching tasks on public datasets. Furthermore, we demonstrate how the learnt joint information can benefit various applications through a set of qualitative and quantitative experiments.

Details

ISSN :
19393539 and 01628828
Volume :
45
Database :
OpenAIRE
Journal :
IEEE Transactions on Pattern Analysis and Machine Intelligence
Accession number :
edsair.doi.dedup.....3507920f31a2ef2820cdcbb2cfb45c09
Full Text :
https://doi.org/10.1109/tpami.2020.3030048