Back to Search
Start Over
Transformer models for enhancing AttnGAN based text to image generation
- Source :
- Image and Vision Computing. 115:104284
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Deep neural networks are capable of producing photographic images that depict given natural language text descriptions. Such models have huge potential in applications such as interior designing, video games, editing and facial sketching for digital forensics. However, only a limited number of methods in the literature have been developed for text to image (TTI) generation. Most of them use Generative Adversarial Networks (GAN) based deep learning methods. Attentional GAN (AttnGAN) is a popular GAN based TTI method that extracts meaningful information from the given text descriptions using attention mechanism. In this paper, we investigate the use of different Transformer models such as BERT, GPT2, XLNet with AttnGAN to solve the challenge of extracting semantic information from the text descriptions. Hence, the proposed AttnGANTRANS architecture has three variants AttnGANBERT, AttnGANXL and AttnGANGPT. The proposed method is successful over the conventional AttnGAN and gives a boosted inception score by 27.23% and a decline of Frechet inception distance by 49.9%. The results in our experiments indicate that the proposed method has the potential to outperform the contemporary state-of-the-art methods and validate the use of Transformer models in improving the performance of TTI generation. The code is made publicly available at https://github.com/sairamkiran9/AttnGAN-trans .
- Subjects :
- business.industry
Computer science
Deep learning
Digital forensics
Machine learning
computer.software_genre
Image (mathematics)
Signal Processing
Code (cryptography)
Computer Vision and Pattern Recognition
Artificial intelligence
Architecture
business
computer
Generative grammar
Natural language
Transformer (machine learning model)
Subjects
Details
- ISSN :
- 02628856
- Volume :
- 115
- Database :
- OpenAIRE
- Journal :
- Image and Vision Computing
- Accession number :
- edsair.doi...........6238d11e4a2c3153ddf63d93a7ea4c00
- Full Text :
- https://doi.org/10.1016/j.imavis.2021.104284