1. Multi-task learning for captioning images with novel words
- Author
-
Zheng, He, Wu, Jiahong, Liang, Rui, Li, Ye, and Li, Xuzhi
- Abstract
Recent captioning models are limited in their ability to describe concepts unseen in paired image–sentence pairs. This study presents a framework of multi-task learning for describing novel words not present in existing image-captioning datasets. The authors’ framework takes advantage of external sources-labelled images from image classification datasets, and semantic knowledge extracted from the annotated text. They propose minimising a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings. When in the inference step they change the BeamSearch step by considering both the caption model and language model enabling the model to generalise novel words outside of image-captioning datasets. They demonstrate that in the framework by adding an annotated text data which can help the image captioning model to describe images with the right corresponding novel words. Extensive experiments are conducted on both AI Challenger and Microsoft coco (MSCOCO) image captioning datasets of two different languages, demonstrating the ability of their framework to describe novel words such as scenes and objects.
- Published
- 2019
- Full Text
- View/download PDF