Back to Search Start Over

Improving image captioning with Pyramid Attention and SC-GAN

Authors :
Tianyu Chen
Bianping Su
Huifang Ma
Zhixin Li
Jingli Wu
Source :
Image and Vision Computing. 117:104340
Publication Year :
2022
Publisher :
Elsevier BV, 2022.

Abstract

Most of the existing image captioning models mainly use global attention, which represents the whole image features, local attention, representing the object features, or a combination of them; there are few models to integrate the relationship information between various object regions of the image. But this relationship information is also very instructive for caption generation. For example, if a football appears, there is a high probability that the image also contains people near the football. In this article, the relationship feature is embedded into the global-local attention to constructing a new Pyramid Attention mechanism, which can explore the internal visual and semantic relationship between different object regions. Besides, to alleviate the exposure bias problem and make the training process more efficient, we propose a new method to apply the Generative Adversarial Network into sequence generation. The greedy decoding method is used to generate an efficient baseline reward for self-critical training. Finally, experiments on MSCOCO dataset show that the model can generate more accurate and vivid captions and outperforms many recent advanced models in various prevailing evaluation metrics on both local and online test sets.

Details

ISSN :
02628856
Volume :
117
Database :
OpenAIRE
Journal :
Image and Vision Computing
Accession number :
edsair.doi...........a4a307162f45fcf3b40ccd1fc2def0d4
Full Text :
https://doi.org/10.1016/j.imavis.2021.104340