Topic scene graphs for image captioning.

Authors :: Zhang, Min
Chen, Jingxiang
Li, Pengfei
Jiang, Ming
Zhou, Zhe
Source :: IET Computer Vision (Wiley-Blackwell); Jun2022, Vol. 16 Issue 4, p364-375, 12p
Publication Year :: 2022
Abstract: When describing an image, people can rapidly extract the topic from the image and find the main object, generating sentences that match the main idea of the image. However, most of the scene graph generation methods do not emphasise the importance of the topic of the image. Consequently, the captions generated by the scene graph‐based image captioning models cannot reflect the topic in the image then expressing the central idea of the image. In this paper, we propose a method for image captioning based on topic scene graphs (TSG). Firstly, we propose the structure of topic scene graphs that express images' topics and the relationships between objects. Then, combined with the topic scene graph, we utilise the salient object detection to generate the topic scene graph highlighting the salient objects of the image. Note that our framework is agnostic to any scene graph‐based image captioning model and thus can be widely applied in the community which seeks salient object predictions. We compare the performance of our topic scene graph with the state‐of‐the‐art scene graph generation models and mainstream image captioning models on MSCOCO and Visual Genome datasets, both achieving better performance. [ABSTRACT FROM AUTHOR]