Back to Search Start Over

An encoder-decoder model for video captioning using RESNET and GRU.

Authors :
Preethi, A.
Dhanalakshmi, P.
Source :
AIP Conference Proceedings. 2023, Vol. 2917 Issue 1, p1-10. 10p.
Publication Year :
2023

Abstract

Video Captioning is a process that generates the sentences for the visual information in a video. It is an essential process for video retrieval and analysis. Unlike the still images, the frames in video are temporally connected. It is very important to consider the visual, temporal and grammatical information while generating captions for a video. This is done through encoder-decoder architecture model. In encoder module, the ResNet-152 is used as a feature extractor to obtain the features from video frames. Then, in the decoder module, LSTM and GRU were employed to make the sentence generation. The architecture is trained and tested over the benchmark dataset Microsoft Video Description Corpus (MSVD) and performance is evaluated using BLEU, METEOR and CIDEr. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
2917
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
173433941
Full Text :
https://doi.org/10.1063/5.0175606