Back to Search Start Over

Enhanced Image Captioning Using Bahdanau Attention Mechanism and Heuristic Beam Search Algorithm

Authors :
S. Abinaya
Mandava Deepak
A. Sherly Alphonse
Source :
IEEE Access, Vol 12, Pp 100991-101003 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

Captioning images is a challenging task at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), that involves generating descriptive text to depict the content of an image. Existing methodologies typically employ Convolutional Neural Networks (CNNs) for feature extraction and Recurrent Neural Networks (RNNs) for generating captions. However, these approaches often suffer from a lack of contextual understanding, inability to capture fine-grained details, and to generate generic captions. This study proposes VisualCaptionNet (VCN), a novel image captioning model that leverages ResNet50 for rich visual feature extraction and a Long Short-Term Memory (LSTM) network for sequential caption generation while retaining context. By incorporating the Bahdanau attention mechanism to focus on relevant image regions and integrating beam search for coherent and contextually relevant descriptions, VCN addresses the limitations of previous methodologies. Extensive experimentation on benchmark datasets such as Flickr30K and Flickr8K demonstrates VCN’s notable improvements of 10% and 12% over baseline models in terms of caption quality, coherence, and relevance. These enhancements emphasize VCN’s effectiveness in advancing image captioning tasks, promising more accurate and contextually relevant descriptions for images.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.191fc72e6a5463b8fd9d691ddc53b1b
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3431091