Start Over

Text-to-image synthesis: Starting composite from the foreground content.

Authors :: Zhang, Zhiqiang
Zhou, Jinjia
Yu, Wenxin
Jiang, Ning
Source :: Information Sciences. Aug2022, Vol. 607, p1265-1285. 21p.
Publication Year :: 2022
Abstract: Recently, text-to-image synthesis has become a hot issue in computer vision and has been widely concerned. Many methods have achieved encouraging results in this field at present, but it is still a great challenge to improve the quality of the synthesized image further. In this paper, we propose a multi-stage synthesis method, which starts composite from the foreground content. The whole synthesis process is divided into three stages. The first stage generates the foreground results, and the third stage synthesizes the final image results. The second stage results include two situations: one is to continue to synthesize the foreground results; the other is to synthesize the image results with background information. Experiments demonstrate that the method of continuing to generate the foreground results in the second stage can achieve better results on the Caltech-UCSD Birds (CUB) and Oxford-102 datasets, while the way of synthesizing foreground results only in the first stage can obtain better performance on the Microsoft Common Objects in Context (MS COCO) dataset. Besides, our synthesized results on the three datasets are subjectively more realistic with better detail processing. It also outperforms most existing methods in quantitative comparison results, which demonstrates the effectiveness and superiority of our method. [ABSTRACT FROM AUTHOR]