Back to Search
Start Over
Revolutionizing Image Captioning: Integrating Attention Mechanisms with Adaptive Fusion Gates.
- Source :
- IAENG International Journal of Computer Science; Mar2024, Vol. 51 Issue 3, p212-221, 10p
- Publication Year :
- 2024
-
Abstract
- In order to dynamically create a sequence of textual descriptions for images, image description models often make use of the attention mechanism, which involves an automatic focus on different regions within an image. However, a prevalent issue with current attention mechanisms is their tendency to overlook essential elements within the image, prioritizing contextual aspects of the object when generating descriptive text. This constraint results in a decrease in the precision of the textual descriptions produced. To address this issue and improve the accuracy of image interpretation, a proposed model for image description utilizes an attentionbased approach and includes a multi-layer decoder and a fusion gate. This model is based on an encoder-decoder architecture and utilizes Residual Network (ResNet) framework for feature extraction during the encoding phase, thereby extending the encoder-decoder structure into the decoding phase. Within this framework, an adaptive fusion gate mechanism is introduced and combined with multi-layer cascade decoders to facilitate the generation of utterances. This allows decoders from lower layers to actively contribute to the final text prediction phase, thereby incrementally improving the accuracy of the predicted text and generating more precise descriptions. MS COCO 2014 dataset has been utilized for the purpose of training and validating the effectiveness of this model in understanding images. The results clearly and unequivocally establish the model's capacity to generate exceptional predictions. When compared with the top-performing models, it has demonstrated significant enhancements, as indicated by a 0.096 rise in BLEU 1 metric, a 0.153 improvement in ROUGE L metric, and a remarkable 0.32 increase in CIDEr metric on MS COCO dataset. The overall improvement in performance across all evaluation criteria highlights the model's alignment with the requirements of image understanding applications. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 1819656X
- Volume :
- 51
- Issue :
- 3
- Database :
- Supplemental Index
- Journal :
- IAENG International Journal of Computer Science
- Publication Type :
- Academic Journal
- Accession number :
- 175782942