Back to Search
Start Over
An encoder-decoder model for video captioning using RESNET and GRU.
- Source :
-
AIP Conference Proceedings . 2023, Vol. 2917 Issue 1, p1-10. 10p. - Publication Year :
- 2023
-
Abstract
- Video Captioning is a process that generates the sentences for the visual information in a video. It is an essential process for video retrieval and analysis. Unlike the still images, the frames in video are temporally connected. It is very important to consider the visual, temporal and grammatical information while generating captions for a video. This is done through encoder-decoder architecture model. In encoder module, the ResNet-152 is used as a feature extractor to obtain the features from video frames. Then, in the decoder module, LSTM and GRU were employed to make the sentence generation. The architecture is trained and tested over the benchmark dataset Microsoft Video Description Corpus (MSVD) and performance is evaluated using BLEU, METEOR and CIDEr. [ABSTRACT FROM AUTHOR]
- Subjects :
- *VIDEOS
*VIDEO processing
*VIDEO coding
Subjects
Details
- Language :
- English
- ISSN :
- 0094243X
- Volume :
- 2917
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- AIP Conference Proceedings
- Publication Type :
- Conference
- Accession number :
- 173433941
- Full Text :
- https://doi.org/10.1063/5.0175606