Video Multimodal Entity Linking via Multi-Perspective Enhanced Subgraph Contrastive Network.

Authors :: Li, Huayu
Yue, Yang
Man, Xiaojun
Li, Haiyang
Source :: International Journal of Software Engineering & Knowledge Engineering; Nov2024, Vol. 34 Issue 11, p1757-1781, 25p
Publication Year :: 2024
Abstract: Video Multimodal Entity Linking (VMEL) is a task to link entities mentioned in videos to entities in multimodal knowledge bases. However, current entity linking methods primarily focus on text and image modalities, neglecting the significance of video modality. To address this challenge, we propose a novel framework called the multi-perspective enhanced Subgraph Contrastive Network (SCMEL) and construct a VMEL dataset named SceneMEL, based on tourism domain. We first integrate textual, auditory and visual modal contexts of videos to generate a comprehensive high-recall candidate entity set. Furthermore, a semantic-enhanced video description subgraph generation module is utilized to convert videos into a multimodal feature graph structure and perform subgraph sampling on the domain-specific knowledge graph. Lastly, we conduct contrastive learning on local perspectives (text, audio, visual) within the video subgraphs and the knowledge graph subgraphs, as well as global perspectives, to capture fine-grained semantic information about videos and entities. A series of experimental results on SceneMel demonstrate the effectiveness of the proposed approach. [ABSTRACT FROM AUTHOR]

Language :: English
ISSN :: 02181940
Volume :: 34
Issue :: 11
Database :: Complementary Index
Journal :: International Journal of Software Engineering & Knowledge Engineering
Publication Type :: Academic Journal
Accession number :: 180974365
Full Text :: https://doi.org/10.1142/S0218194024500360

Full Text Access

Tools