Back to Search
Start Over
Image Caption Generation with Part of Speech Guidance
- Source :
- Pattern Recognition Letters. 119:229-237
- Publication Year :
- 2019
- Publisher :
- Elsevier BV, 2019.
-
Abstract
- As a fundamental problem in image understanding, image caption generation has attracted much attention from both computer vision and natural language processing communities. In this paper, we focus on how to exploit the structure information of a natural sentence, which is used to describe the content of an image. We discover that the Part of Speech (PoS) tags of a sentence, are very effective cues for guiding the Long Short-Term Memory (LSTM) based word generator. More specifically, given a sentence, the PoS tag of each word is utilized to determine whether it is essential to input image representation into the word generator. Benefiting from such a strategy, our model can closely connect the visual attributes of an image to the word concepts in the natural language space. Experimental results on the most popular benchmark datasets, e.g., Flickr30k and MS COCO, consistently demonstrate that our method can significantly enhance the performance of a standard image caption generation model, and achieve the conpetitive results.
- Subjects :
- Structure (mathematical logic)
Focus (computing)
business.industry
Computer science
Speech recognition
020207 software engineering
02 engineering and technology
computer.software_genre
Part of speech
Artificial Intelligence
Signal Processing
0202 electrical engineering, electronic engineering, information engineering
Benchmark (computing)
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Artificial intelligence
business
computer
Software
Natural language
Sentence
Word (computer architecture)
Natural language processing
Generator (mathematics)
Subjects
Details
- ISSN :
- 01678655
- Volume :
- 119
- Database :
- OpenAIRE
- Journal :
- Pattern Recognition Letters
- Accession number :
- edsair.doi...........7520a05f4f97935474026b85505adf39