Back to Search Start Over

Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Authors :
Hongchen Tan
Xiuping Liu
Xin Li
Baocai Yin
Source :
IEEE Transactions on Multimedia. 24:832-845
Publication Year :
2022
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2022.

Abstract

Synthesizing photo-realistic images based on text descriptions is a challenging image generation problem. Although many recent approaches have significantly advanced the performance of text-to-image generation, to guarantee semantic matchings between the text description and synthesized image remains very challenging. In this paper, we propose a new model, Cross-modal Semantic Matching Generative Adversarial Networks (CSM-GAN), to improve the semantic consistency between text description and synthesized image for a fine-grained text-to-image generation. Two new modules are proposed in CSM-GAN: Text Encoder Module (TEM) and Textual-Visual Semantic Matching Module (TVSMM). TVSMM is aimed at making \textcolor{red}{the distance of the pairs of synthesized image and its corresponding text description closer}, in global semantic embedding space, than those of mismatched pairs. This improves the semantic consistency and consequently, the generalizability of CSM-GAN. In TEM, we introduce Text Convolutional Neural Networks (Text\_CNNs) to capture and highlight local visual features in textual descriptions. Thorough experiments on two public benchmark datasets demonstrated the superiority of CSM-GAN over other representative state-of-the-art methods.

Details

ISSN :
19410077 and 15209210
Volume :
24
Database :
OpenAIRE
Journal :
IEEE Transactions on Multimedia
Accession number :
edsair.doi...........a5fcf90ae840cdc49df34b68e64ad721