1. Multi-level video captioning method based on semantic space.
- Author
-
Yao, Xiao, Zeng, Yuanlin, Gu, Min, Yuan, Ruxi, Li, Jie, and Ge, Junyi
- Subjects
KNOWLEDGE base ,KNOWLEDGE representation (Information theory) ,NATURAL languages ,PROBLEM solving ,VIDEOS - Abstract
Video captioning is designed to generate natural language descriptions based on video content. Traditional methods extract visual features and interactive relationship features between objects, but the problem of video feature isolation and semantic hierarchy is ignored. This paper proposes a Multi-Level Video Captioning Method based on semantic space (S-MLM) to solve the above problems. S-MLM extracts different levels of visual elements and visual relationships, and the visual information of different levels is aggregated layer by layer to complete the generation of low-level to high-level visual features. The multi-level structure semantic graph is constructed from the semantic point of view. It does not rely on external knowledge bases, and uses its own information as guidance to enhance feature representation and improve semantic understanding. We conduct experiments on MSVD and MSR-VTT datasets, and the experimental results show that the performance of video captioning is further improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF