Back to Search Start Over

Revolutionizing Image Captioning: Integrating Attention Mechanisms with Adaptive Fusion Gates.

Authors :
Shou-Jun Sheng
Zi-Wei Zhou
Source :
IAENG International Journal of Computer Science; Mar2024, Vol. 51 Issue 3, p212-221, 10p
Publication Year :
2024

Abstract

In order to dynamically create a sequence of textual descriptions for images, image description models often make use of the attention mechanism, which involves an automatic focus on different regions within an image. However, a prevalent issue with current attention mechanisms is their tendency to overlook essential elements within the image, prioritizing contextual aspects of the object when generating descriptive text. This constraint results in a decrease in the precision of the textual descriptions produced. To address this issue and improve the accuracy of image interpretation, a proposed model for image description utilizes an attentionbased approach and includes a multi-layer decoder and a fusion gate. This model is based on an encoder-decoder architecture and utilizes Residual Network (ResNet) framework for feature extraction during the encoding phase, thereby extending the encoder-decoder structure into the decoding phase. Within this framework, an adaptive fusion gate mechanism is introduced and combined with multi-layer cascade decoders to facilitate the generation of utterances. This allows decoders from lower layers to actively contribute to the final text prediction phase, thereby incrementally improving the accuracy of the predicted text and generating more precise descriptions. MS COCO 2014 dataset has been utilized for the purpose of training and validating the effectiveness of this model in understanding images. The results clearly and unequivocally establish the model's capacity to generate exceptional predictions. When compared with the top-performing models, it has demonstrated significant enhancements, as indicated by a 0.096 rise in BLEU 1 metric, a 0.153 improvement in ROUGE L metric, and a remarkable 0.32 increase in CIDEr metric on MS COCO dataset. The overall improvement in performance across all evaluation criteria highlights the model's alignment with the requirements of image understanding applications. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1819656X
Volume :
51
Issue :
3
Database :
Supplemental Index
Journal :
IAENG International Journal of Computer Science
Publication Type :
Academic Journal
Accession number :
175782942