Back to Search
Start Over
Improving image captioning with Pyramid Attention and SC-GAN
- Source :
- Image and Vision Computing. 117:104340
- Publication Year :
- 2022
- Publisher :
- Elsevier BV, 2022.
-
Abstract
- Most of the existing image captioning models mainly use global attention, which represents the whole image features, local attention, representing the object features, or a combination of them; there are few models to integrate the relationship information between various object regions of the image. But this relationship information is also very instructive for caption generation. For example, if a football appears, there is a high probability that the image also contains people near the football. In this article, the relationship feature is embedded into the global-local attention to constructing a new Pyramid Attention mechanism, which can explore the internal visual and semantic relationship between different object regions. Besides, to alleviate the exposure bias problem and make the training process more efficient, we propose a new method to apply the Generative Adversarial Network into sequence generation. The greedy decoding method is used to generate an efficient baseline reward for self-critical training. Finally, experiments on MSCOCO dataset show that the model can generate more accurate and vivid captions and outperforms many recent advanced models in various prevailing evaluation metrics on both local and online test sets.
- Subjects :
- Closed captioning
Sequence
Computer science
business.industry
Process (engineering)
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Object (computer science)
Machine learning
computer.software_genre
Image (mathematics)
Feature (computer vision)
Signal Processing
Computer Vision and Pattern Recognition
Artificial intelligence
Pyramid (image processing)
business
computer
Decoding methods
Subjects
Details
- ISSN :
- 02628856
- Volume :
- 117
- Database :
- OpenAIRE
- Journal :
- Image and Vision Computing
- Accession number :
- edsair.doi...........a4a307162f45fcf3b40ccd1fc2def0d4
- Full Text :
- https://doi.org/10.1016/j.imavis.2021.104340