Back to Search
Start Over
Video Multimodal Entity Linking via Multi-Perspective Enhanced Subgraph Contrastive Network.
- Source :
- International Journal of Software Engineering & Knowledge Engineering; Nov2024, Vol. 34 Issue 11, p1757-1781, 25p
- Publication Year :
- 2024
-
Abstract
- Video Multimodal Entity Linking (VMEL) is a task to link entities mentioned in videos to entities in multimodal knowledge bases. However, current entity linking methods primarily focus on text and image modalities, neglecting the significance of video modality. To address this challenge, we propose a novel framework called the multi-perspective enhanced Subgraph Contrastive Network (SCMEL) and construct a VMEL dataset named SceneMEL, based on tourism domain. We first integrate textual, auditory and visual modal contexts of videos to generate a comprehensive high-recall candidate entity set. Furthermore, a semantic-enhanced video description subgraph generation module is utilized to convert videos into a multimodal feature graph structure and perform subgraph sampling on the domain-specific knowledge graph. Lastly, we conduct contrastive learning on local perspectives (text, audio, visual) within the video subgraphs and the knowledge graph subgraphs, as well as global perspectives, to capture fine-grained semantic information about videos and entities. A series of experimental results on SceneMel demonstrate the effectiveness of the proposed approach. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02181940
- Volume :
- 34
- Issue :
- 11
- Database :
- Complementary Index
- Journal :
- International Journal of Software Engineering & Knowledge Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 180974365
- Full Text :
- https://doi.org/10.1142/S0218194024500360