Back to Search
Start Over
Video Captioning Using Global-Local Representation.
- Source :
-
IEEE Transactions on Circuits & Systems for Video Technology . Oct2022, Vol. 32 Issue 10, p6642-6656. 15p. - Publication Year :
- 2022
-
Abstract
- Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room for improvement. In this work, we approach the video captioning task from a new perspective and propose a GLR framework, namely a global-local representation granularity. Our GLR demonstrates three advantages over the prior efforts. First, we propose a simple solution, which exploits extensive vision representations from different video ranges to improve linguistic expression. Second, we devise a novel global-local encoder, which encodes different video representations including long-range, short-range and local-keyframe, to produce rich semantic vocabulary for obtaining a descriptive granularity of video contents across frames. Finally, we introduce the progressive training strategy which can effectively organize feature learning to incur optimal captioning behavior. Evaluated on the MSR-VTT and MSVD dataset, we outperform recent state-of-the-art methods including a well-tuned SA-LSTM baseline by a significant margin, with shorter training schedules. Because of its simplicity and efficacy, we hope that our GLR could serve as a strong baseline for many video understanding tasks besides video captioning. Code will be available. [ABSTRACT FROM AUTHOR]
- Subjects :
- *VIDEOS
*TRAIN schedules
*NATURAL languages
Subjects
Details
- Language :
- English
- ISSN :
- 10518215
- Volume :
- 32
- Issue :
- 10
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Circuits & Systems for Video Technology
- Publication Type :
- Academic Journal
- Accession number :
- 160693876
- Full Text :
- https://doi.org/10.1109/TCSVT.2022.3177320