Back to Search Start Over

Video captioning with stacked attention and semantic hard pull.

Authors :
Rahman MM
Abedin T
Prottoy KSS
Moshruba A
Siddiqui FH
Source :
PeerJ. Computer science [PeerJ Comput Sci] 2021 Aug 05; Vol. 7, pp. e664. Date of Electronic Publication: 2021 Aug 05 (Print Publication: 2021).
Publication Year :
2021

Abstract

Video captioning, i.e. , the task of generating captions from video sequences creates a bridge between the Natural Language Processing and Computer Vision domains of computer science. The task of generating a semantically accurate description of a video is quite complex. Considering the complexity, of the problem, the results obtained in recent research works are praiseworthy. However, there is plenty of scope for further investigation. This paper addresses this scope and proposes a novel solution. Most video captioning models comprise two sequential/recurrent layers-one as a video-to-context encoder and the other as a context-to-caption decoder. This paper proposes a novel architecture, namely Semantically Sensible Video Captioning (SSVC) which modifies the context generation mechanism by using two novel approaches-"stacked attention" and "spatial hard pull". As there are no exclusive metrics for evaluating video captioning models, we emphasize both quantitative and qualitative analysis of our model. Hence, we have used the BLEU scoring metric for quantitative analysis and have proposed a human evaluation metric for qualitative analysis, namely the Semantic Sensibility (SS) scoring metric. SS Score overcomes the shortcomings of common automated scoring metrics. This paper reports that the use of the aforementioned novelties improves the performance of state-of-the-art architectures.<br />Competing Interests: The authors declare there are no competing interests.<br /> (©2021 Rahman et al.)

Details

Language :
English
ISSN :
2376-5992
Volume :
7
Database :
MEDLINE
Journal :
PeerJ. Computer science
Publication Type :
Academic Journal
Accession number :
34435104
Full Text :
https://doi.org/10.7717/peerj-cs.664