Back to Search
Start Over
Enhanced Image Captioning Using Bahdanau Attention Mechanism and Heuristic Beam Search Algorithm
- Source :
- IEEE Access, Vol 12, Pp 100991-101003 (2024)
- Publication Year :
- 2024
- Publisher :
- IEEE, 2024.
-
Abstract
- Captioning images is a challenging task at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), that involves generating descriptive text to depict the content of an image. Existing methodologies typically employ Convolutional Neural Networks (CNNs) for feature extraction and Recurrent Neural Networks (RNNs) for generating captions. However, these approaches often suffer from a lack of contextual understanding, inability to capture fine-grained details, and to generate generic captions. This study proposes VisualCaptionNet (VCN), a novel image captioning model that leverages ResNet50 for rich visual feature extraction and a Long Short-Term Memory (LSTM) network for sequential caption generation while retaining context. By incorporating the Bahdanau attention mechanism to focus on relevant image regions and integrating beam search for coherent and contextually relevant descriptions, VCN addresses the limitations of previous methodologies. Extensive experimentation on benchmark datasets such as Flickr30K and Flickr8K demonstrates VCN’s notable improvements of 10% and 12% over baseline models in terms of caption quality, coherence, and relevance. These enhancements emphasize VCN’s effectiveness in advancing image captioning tasks, promising more accurate and contextually relevant descriptions for images.
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 12
- Database :
- Directory of Open Access Journals
- Journal :
- IEEE Access
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.191fc72e6a5463b8fd9d691ddc53b1b
- Document Type :
- article
- Full Text :
- https://doi.org/10.1109/ACCESS.2024.3431091