Back to Search Start Over

Local-Global Graph Pooling via Mutual Information Maximization for Video-Paragraph Retrieval.

Authors :
Zhang, Pengcheng
Zhao, Zhou
Wang, Nannan
Yu, Jun
Wu, Fei
Source :
IEEE Transactions on Circuits & Systems for Video Technology. Oct2022, Vol. 32 Issue 10, p7133-7146. 14p.
Publication Year :
2022

Abstract

As a task of cross-modal retrieval between long videos and paragraphs, video-paragraph retrieval is a non-trivial task. Unlike traditional video-text retrieval, the video in video-paragraph retrieval usually contains multiple clips. Each clip corresponds to a descriptive sentence; all the sentences constitute the corresponding paragraph of the video. Previous methods for video-paragraph retrieval usually encode videos and para-graphs from segment-level (clips and sentences) and overall-level (videos and paragraphs). However, there are also contents about actions and objects that exist in the segment. Hence, we propose a Local-Global Graph Pooling Network (LGGP) via Mutual Information Maximization for video-paragraph retrieval. Our model disentangles videos and paragraphs into four levels: overall-level, segment-level, motion-level, and object-level. We construct the Hierarchical Local Graph (segment-level, motion-level, and object-level) and the Hierarchical Global Graph (overall-level, segment-level, motion-level, and object-level), respectively, for semantic interaction among different levels. Meanwhile, to obtain hierarchical pooling features with fine-grained semantic information, we design hierarchical graph pooling methods to maximize the mutual information between pooling features and corresponding graph nodes. We evaluate our model on two video-paragraph retrieval datasets with three different video features. The experimental results show that our model establishes state-of-the-art results for video-paragraph retrieval. Our code will be released at https://github.com/PengchengZhang1997/LGGP. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*VIDEOS
*FEATURE extraction

Details

Language :
English
ISSN :
10518215
Volume :
32
Issue :
10
Database :
Academic Search Index
Journal :
IEEE Transactions on Circuits & Systems for Video Technology
Publication Type :
Academic Journal
Accession number :
160693873
Full Text :
https://doi.org/10.1109/TCSVT.2022.3176866