Start Over

Transformer models for enhancing AttnGAN based text to image generation

Authors :: M. Indupriya
T.V. Manikanta
P.V. Sudeep
M. S. S Ram Kiran
S. Naveen
Source :: Image and Vision Computing. 115:104284
Publication Year :: 2021
Publisher :: Elsevier BV, 2021.
Abstract: Deep neural networks are capable of producing photographic images that depict given natural language text descriptions. Such models have huge potential in applications such as interior designing, video games, editing and facial sketching for digital forensics. However, only a limited number of methods in the literature have been developed for text to image (TTI) generation. Most of them use Generative Adversarial Networks (GAN) based deep learning methods. Attentional GAN (AttnGAN) is a popular GAN based TTI method that extracts meaningful information from the given text descriptions using attention mechanism. In this paper, we investigate the use of different Transformer models such as BERT, GPT2, XLNet with AttnGAN to solve the challenge of extracting semantic information from the text descriptions. Hence, the proposed AttnGANTRANS architecture has three variants AttnGANBERT, AttnGANXL and AttnGANGPT. The proposed method is successful over the conventional AttnGAN and gives a boosted inception score by 27.23% and a decline of Frechet inception distance by 49.9%. The results in our experiments indicate that the proposed method has the potential to outperform the contemporary state-of-the-art methods and validate the use of Transformer models in improving the performance of TTI generation. The code is made publicly available at https://github.com/sairamkiran9/AttnGAN-trans .