Back to Search Start Over

Combining semi-supervised model and optimized LSTM for image caption generation based on pseudo labels.

Authors :
Padate, Roshni
Jain, Amit
Kalla, Mukesh
Sharma, Arvind
Source :
Multimedia Tools & Applications; Mar2024, Vol. 83 Issue 10, p29997-30017, 21p
Publication Year :
2024

Abstract

Artificial intelligence's crucial area of image captions. It's a very difficult situation until the advancement of DL is made. A lot of open challenges remain as robustness, generalization and accuracy, results are far from reasonable. As image captioning schemes are data avaricious, pre-training on larger scale datasets, even if not well-curated, is fetching a solid approach. In addition to precisely identifying the image includes the scene, object, connection, and qualities of the item in the image, the image caption generation method should produce natural, fluid, precise, and useful sentences. However, since not all visual information may be utilized, it might be difficult to effectively convey the image's content when writing image captions. Here, the image captioning is done under two models, i.e. NIC model and LSTM based model. At first, (Neural Image Caption) NIC process is done, where, CNN based caption generation is carried out for unlabelled and labeled dataset. Further, features namely, improved BOW and N-gram are derived that are used for training the CNN model. The final caption is generated by optimized LSTM, where the weights are optimally tuned by Harris Hawks with Sinusoidal Chaotic Map Assisted Exploitation (HH-SCME). Finally, BLEU score, rouge and CIDER scores are computed to prove the efficiency of HH-SCME. The proposed model of LSTM+HH-SCME achieves 0.84 BLEU score 1 value as compared to other existing methods like CNN, SSO, PRO, AOA, RNN, LSTM and LSTM+HH-SCME. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13807501
Volume :
83
Issue :
10
Database :
Complementary Index
Journal :
Multimedia Tools & Applications
Publication Type :
Academic Journal
Accession number :
175897006
Full Text :
https://doi.org/10.1007/s11042-023-16687-x