A New Image Captioning Approach for Visually Impaired People

Authors :: Burak Makav
Volkan Kilic
Source :: 2019 11th International Conference on Electrical and Electronics Engineering (ELECO).
Publication Year :: 2019
Publisher :: IEEE, 2019.
Abstract: Automatic caption generation in natural language to describe the visual content of an image has attracted an increasing amount of attention in the last decade due to its potential applications. It is a challenging task to generate captions with proper linguistics properties as it requires an advanced level of image understanding that goes far beyond image classification and object detection. In this paper, we propose to use the Stanford CoreNLP model to generate a caption after images are trained using VGG16 deep learning architecture. The visual attributes of images are extracted with the VGG16, which conveys richer content, and then they are fed into the Stanford model for caption generation. Experimental results on the MSCOCO dataset show that the proposed model significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

Subjects :: Closed captioning
Contextual image classification
Computer science
business.industry
Deep learning
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
computer.software_genre
Object detection
Image (mathematics)
Task (project management)
Artificial intelligence
Architecture
business
computer
Natural language
Natural language processing

Database :: OpenAIRE
Journal :: 2019 11th International Conference on Electrical and Electronics Engineering (ELECO)
Accession number :: edsair.doi...........51c0260533c39749717a90a83a963e3a
Full Text :: https://doi.org/10.23919/eleco47770.2019.8990630