Back to Search
Start Over
Local-Global Graph Pooling via Mutual Information Maximization for Video-Paragraph Retrieval.
- Source :
-
IEEE Transactions on Circuits & Systems for Video Technology . Oct2022, Vol. 32 Issue 10, p7133-7146. 14p. - Publication Year :
- 2022
-
Abstract
- As a task of cross-modal retrieval between long videos and paragraphs, video-paragraph retrieval is a non-trivial task. Unlike traditional video-text retrieval, the video in video-paragraph retrieval usually contains multiple clips. Each clip corresponds to a descriptive sentence; all the sentences constitute the corresponding paragraph of the video. Previous methods for video-paragraph retrieval usually encode videos and para-graphs from segment-level (clips and sentences) and overall-level (videos and paragraphs). However, there are also contents about actions and objects that exist in the segment. Hence, we propose a Local-Global Graph Pooling Network (LGGP) via Mutual Information Maximization for video-paragraph retrieval. Our model disentangles videos and paragraphs into four levels: overall-level, segment-level, motion-level, and object-level. We construct the Hierarchical Local Graph (segment-level, motion-level, and object-level) and the Hierarchical Global Graph (overall-level, segment-level, motion-level, and object-level), respectively, for semantic interaction among different levels. Meanwhile, to obtain hierarchical pooling features with fine-grained semantic information, we design hierarchical graph pooling methods to maximize the mutual information between pooling features and corresponding graph nodes. We evaluate our model on two video-paragraph retrieval datasets with three different video features. The experimental results show that our model establishes state-of-the-art results for video-paragraph retrieval. Our code will be released at https://github.com/PengchengZhang1997/LGGP. [ABSTRACT FROM AUTHOR]
- Subjects :
- *VIDEOS
*FEATURE extraction
Subjects
Details
- Language :
- English
- ISSN :
- 10518215
- Volume :
- 32
- Issue :
- 10
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Circuits & Systems for Video Technology
- Publication Type :
- Academic Journal
- Accession number :
- 160693873
- Full Text :
- https://doi.org/10.1109/TCSVT.2022.3176866