137 results on '"text-to-image synthesis"'
Search Results
2. Design of Augmented Diffusion Model for Text-to-Image Representation Using Hybrid GAN
- Author
-
Ansari, Subuhi Kashif, Kumar, Rakesh, Li, Gang, Series Editor, Filipe, Joaquim, Series Editor, Xu, Zhiwei, Series Editor, Taratukhin, Victor, editor, Levchenko, Artem, editor, and Kim, Sohyeong, editor
- Published
- 2025
- Full Text
- View/download PDF
3. PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
- Author
-
Lu, Guansong, Guo, Yuanfan, Han, Jianhua, Niu, Minzhe, Zeng, Yihan, Xu, Songcen, Huang, Zeyi, Zhong, Zhao, Zhang, Wei, Xu, Hang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
4. SwiftBrush V2: Make Your One-Step Diffusion Model Better Than Its Teacher
- Author
-
Dao, Trung, Nguyen, Thuan Hoang, Le, Thanh, Vu, Duc, Nguyen, Khoi, Pham, Cuong, Tran, Anh, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Advancing Persistent Character Generation: Comparative Analysis of Fine-Tuning Techniques for Diffusion Models.
- Author
-
Martini, Luca, Iacono, Saverio, Zolezzi, Daniele, and Vercelli, Gianni Viardo
- Subjects
- *
LOW-rank matrices , *ARTIFICIAL intelligence , *COMPARATIVE studies - Abstract
In the evolving field of artificial intelligence, fine-tuning diffusion models is crucial for generating contextually coherent digital characters across various media. This paper examines four advanced fine-tuning techniques: Low-Rank Adaptation (LoRA), DreamBooth, Hypernetworks, and Textual Inversion. Each technique enhances the specificity and consistency of character generation, expanding the applications of diffusion models in digital content creation. LoRA efficiently adapts models to new tasks with minimal adjustments, making it ideal for environments with limited computational resources. It excels in low VRAM contexts due to its targeted fine-tuning of low-rank matrices within cross-attention layers, enabling faster training and efficient parameter tweaking. DreamBooth generates highly detailed, subject-specific images but is computationally intensive and suited for robust hardware environments. Hypernetworks introduce auxiliary networks that dynamically adjust the model's behavior, allowing for flexibility during inference and on-the-fly model switching. This adaptability, however, can result in slightly lower image quality. Textual Inversion embeds new concepts directly into the model's embedding space, allowing for rapid adaptation to novel styles or concepts, but is less effective for precise character generation. This analysis shows that LoRA is the most efficient for producing high-quality outputs with minimal computational overhead. In contrast, DreamBooth excels in high-fidelity images at the cost of longer training. Hypernetworks provide adaptability with some tradeoffs in quality, while Textual Inversion serves as a lightweight option for style integration. These techniques collectively enhance the creative capabilities of diffusion models, delivering high-quality, contextually relevant outputs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Persistent Homology Analysis of AI-Generated Fractal Patterns: A Mathematical Framework for Evaluating Geometric Authenticity.
- Author
-
Lee, Minhyeok and Lee, Soyeon
- Subjects
- *
PATTERNS (Mathematics) , *STABLE Diffusion , *COMPUTATIONAL topology , *FRACTAL analysis ,FRACTAL dimensions - Abstract
We present a mathematical framework for analyzing fractal patterns in AI-generated images using persistent homology. Given a text-to-image mapping M : T → I , we demonstrate that the persistent homology groups H k (t) of sublevel set filtrations { f − 1 ((− ∞ , t ]) } t ∈ R characterize multi-scale geometric structures, where f : M (p) → R is the grayscale intensity function of a generated image. The primary challenge lies in quantifying self-similarity in scales, which we address by analyzing birth–death pairs (b i , d i) in the persistence diagram P D (M (p)) . Our contribution extends beyond applying the stability theorem to AI-generated fractals; we establish how the self-similarity inherent in fractal patterns manifests in the persistence diagrams of generated images. We validate our approach using the Stable Diffusion 3.5 model for four fractal categories: ferns, trees, spirals, and crystals. An analysis of guidance scale effects γ ∈ [ 4.0 , 8.0 ] reveals monotonic relationships between model parameters and topological features. Stability testing confirms robustness under noise perturbations η ≤ 0.2 , with feature count variations Δ μ f < 0.5 . Our framework provides a foundation for enhancing generative models and evaluating their geometric fidelity in fractal pattern synthesis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Swinv2-Imagen: hierarchical vision transformer diffusion models for text-to-image generation.
- Author
-
Li, Ruijun, Li, Weihua, Yang, Yi, Wei, Hanyu, Jiang, Jianhua, and Bai, Quan
- Subjects
- *
GRAPH neural networks , *TRANSFORMER models , *LANGUAGE models , *IMAGE processing - Abstract
Recently, diffusion models have been proven to perform remarkably well in text-to-image synthesis tasks in a number of studies, immediately presenting new study opportunities for image generation. Google's Imagen follows this research trend and outperforms DALLE2 as the best model for text-to-image generation. However, Imagen merely uses a T5 language model for text processing, which cannot ensure learning the semantic information of the text. Furthermore, the Efficient UNet leveraged by Imagen is not the best choice in image processing. To address these issues, we propose the Swinv2-Imagen, a novel text-to-image diffusion model based on a Hierarchical Visual Transformer and a Scene Graph incorporating a semantic layout. In the proposed model, the feature vectors of entities and relationships are extracted and involved in the diffusion model, effectively improving the quality of generated images. On top of that, we also introduce a Swin-Transformer-based UNet architecture, called Swinv2-Unet, which can address the problems stemming from the CNN convolution operations. Extensive experiments are conducted to evaluate the performance of the proposed model by using three real-world datasets, i.e. MSCOCO, CUB and MM-CelebA-HQ. The experimental results show that the proposed Swinv2-Imagen model outperforms several popular state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Masked-attention diffusion guidance for spatially controlling text-to-image generation.
- Author
-
Endo, Yuki
- Subjects
- *
STABLE Diffusion , *ATTENTION control , *SEMANTIC computing , *BEAM steering , *DIFFUSION control - Abstract
Text-to-image synthesis has achieved high-quality results with recent advances in diffusion models. However, text input alone has high spatial ambiguity and limited user controllability. Most existing methods allow spatial control through additional visual guidance (e.g., sketches and semantic masks) but require additional training with annotated images. In this paper, we propose a method for spatially controlling text-to-image generation without further training of diffusion models. Our method is based on the insight that the cross-attention maps reflect the positional relationship between words and pixels. Our aim is to control the attention maps according to given semantic masks and text prompts. To this end, we first explore a simple approach of directly swapping the cross-attention maps with constant maps computed from the semantic regions. Some prior works also allow training-free spatial control of text-to-image diffusion models by directly manipulating cross-attention maps. However, these approaches still suffer from misalignment to given masks because manipulated attention maps are far from actual ones learned by diffusion models. To address this issue, we propose masked-attention guidance, which can generate images more faithful to semantic masks via indirect control of attention to each word and pixel by manipulating noise images fed to diffusion models. Masked-attention guidance can be easily integrated into pre-trained off-the-shelf diffusion models (e.g., Stable Diffusion) and applied to the tasks of text-guided image editing. Experiments show that our method enables more accurate spatial control than baselines qualitatively and quantitatively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. ISF-GAN: Imagine, Select, and Fuse with GPT-Based Text Enrichment for Text-to-Image Synthesis.
- Author
-
Sheng, Yefei, Tao, Ming, Wang, Jie, and Bao*, Bing-Kun
- Subjects
SEMANTICS ,PROBLEM solving - Abstract
Text-to-Image synthesis aims to generate an accurate and semantically consistent image from a given text description. However, it is difficult for existing generative methods to generate semantically complete images from a single piece of text. Some works try to expand the input text to multiple captions via retrieving similar descriptions of the input text from the training set but still fail to fill in missing image semantics. In this article, we propose a GAN-based approach to Imagine, Select, and Fuse for Text-to-image synthesis, named ISF-GAN. The proposed ISF-GAN contains Imagine Stage and Select and Fuse Stage to solve the above problems. First, the Imagine Stage proposes a text completion and enrichment module. This module guides a GPT-based model to enrich the text expression beyond the original dataset. Second, the Select and Fuse Stage selects qualified text descriptions and then introduces a cross-modal attentional mechanism to interact these different sentence embeddings with the image features at different scales. In short, our proposed model enriches the input text information for completing missing semantics and introduces a cross-modal attentional mechanism to maximize the utilization of enriched text information to generate semantically consistent images. Experimental results on CUB, Oxford-102, and CelebA-HQ datasets prove the effectiveness and superiority of the proposed network. Code is available at https://github.com/Feilingg/ISF-GAN [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Artificial Intelligence in 2D Games: Analysis on Customised Character Generation
- Author
-
Hemlata Parmar, Murari, Utsav Krishan, Hameurlain, Abdelkader, Editorial Board Member, Rocha, Álvaro, Series Editor, Idri, Ali, Editorial Board Member, Vaseashta, Ashok, Editorial Board Member, Dubey, Ashwani Kumar, Editorial Board Member, Montenegro, Carlos, Editorial Board Member, Laporte, Claude, Editorial Board Member, Moreira, Fernando, Editorial Board Member, Peñalvo, Francisco, Editorial Board Member, Dzemyda, Gintautas, Editorial Board Member, Mejia-Miranda, Jezreel, Editorial Board Member, Hall, Jon, Editorial Board Member, Piattini, Mário, Editorial Board Member, Holanda, Maristela, Editorial Board Member, Tang, Mincong, Editorial Board Member, Ivanovíc, Mirjana, Editorial Board Member, Muñoz, Mirna, Editorial Board Member, Kanth, Rajeev, Editorial Board Member, Anwar, Sajid, Editorial Board Member, Herawan, Tutut, Editorial Board Member, Colla, Valentina, Editorial Board Member, Devedzic, Vladan, Editorial Board Member, Raj, Pethuru, editor, Rocha, Alvaro, editor, Singh, Simar Preet, editor, Dutta, Pushan Kumar, editor, and Sundaravadivazhagan, B., editor
- Published
- 2024
- Full Text
- View/download PDF
11. Text-to-Image Generation with Multiscale Semantic Context-Aware Generative Adversarial Networks
- Author
-
Dong, Pei, Wu, Lei, Meng, Lei, Meng, Xiangxu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Pan, Yijie, editor, and Guo, Jiayang, editor
- Published
- 2024
- Full Text
- View/download PDF
12. CELL-E: A Text-to-Image Transformer for Protein Image Prediction
- Author
-
Khwaja, Emaad, Song, Yun S., Huang, Bo, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, and Ma, Jian, editor
- Published
- 2024
- Full Text
- View/download PDF
13. Text-To-Image Generation Using Generative Adversarial Networks with Adaptive Attribute Modulation
- Author
-
Srilatha, M., Reddy, P. Chenna, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kaiser, M. Shamim, editor, Xie, Juanying, editor, and Rathore, Vijay Singh, editor
- Published
- 2024
- Full Text
- View/download PDF
14. Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN
- Author
-
Shang, Jun, Yu, Wenxin, Che, Lu, Zhang, Zhiqiang, Cai, Hongjie, Deng, Zhiyu, Gong, Jun, Chen, Peng, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
15. Seeing is No Longer Believing: A Survey on the State of Deepfakes, AI-Generated Humans, and Other Nonveridical Media
- Author
-
Pocol, Andreea, Istead, Lesley, Siu, Sherman, Mokhtari, Sabrina, Kodeiri, Sara, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sheng, Bin, editor, Bi, Lei, editor, Kim, Jinman, editor, Magnenat-Thalmann, Nadia, editor, and Thalmann, Daniel, editor
- Published
- 2024
- Full Text
- View/download PDF
16. AMM-GAN: Attribute-Matching Memory for Person Text-to-Image Generation
- Author
-
Yue, Wei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
17. Learning Scene Graph for Better Cross-Domain Image Captioning
- Author
-
Jia, Junhua, Xin, Xiaowei, Gao, Xiaoyan, Ding, Xiangqian, Pang, Shunpeng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
18. Co-GAN: A Text-to-Image Synthesis Model with Local and Integral Features
- Author
-
Liu, Lulu, Xie, Ziqi, Chen, Yufei, Deng, Qiujun, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
19. Text to Image Generation with Conformer-GAN
- Author
-
Deng, Zhiyu, Yu, Wenxin, Che, Lu, Chen, Shiyu, Zhang, Zhiqiang, Shang, Jun, Chen, Peng, Gong, Jun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
20. Exploring Progress in Text-to-Image Synthesis: An In-Depth Survey on the Evolution of Generative Adversarial Networks
- Author
-
Md Ahsan Habib, Md Anwar Hussen Wadud, Md Fazlul Karim Patwary, Mohammad Motiur Rahman, M. F. Mridha, Yuichi Okuyama, and Jungpil Shin
- Subjects
Generative adversarial networks ,attention mechanism ,C-GAN ,consistency ,text-to-image synthesis ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The emergence of generative adversarial networks (GANs) has ignited substantial interest in the domain of synthesizing images from textual descriptions. This approach has demonstrated remarkable versatility and user-friendliness in producing conditioned images, showcasing notable progress in areas like diversity, visual realism, and semantic alignment in recent years. Notwithstanding these developments, the discipline still faces difficulties, such as producing high-resolution pictures with several objects and developing trustworthy evaluation standards that are in line with human vision. The goal of this study is to provide a comprehensive overview of the state of stochastic text-to-image creation models as of right now. It examines how they have changed over the previous five years and suggests a classification system depending on the degree of supervision required. The paper highlights shortcomings, provides a critical evaluation of current approaches for assessing text-to-image synthesizing models, and suggests further study areas. These goals include improving the training of models and designs for architecture, developing more reliable assessment criteria, and fine-tuning datasets. This review, which focuses on text-to-image synthesizing, is a useful addition to earlier surveys on adversarial networks that are generative and offers guidance for future studies on the subject.
- Published
- 2024
- Full Text
- View/download PDF
21. Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction
- Author
-
Sarah K. Alhabeeb and Amal A. Al-Shargabi
- Subjects
Deep learning ,diffusion model ,generative models ,generative adversarial network ,text-to-image synthesis ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Text-to-image synthesis, the process of turning words into images, opens up a world of creative possibilities, and meets the growing need for engaging visual experiences in a world that is becoming more image-based. As machine learning capabilities expanded, the area progressed from simple tools and systems to robust deep learning models that can automatically generate realistic images from textual inputs. Modern, large-scale text-to-image generation models have made significant progress in this direction, producing diversified and high-quality images from text description prompts. Although several methods exist, Generative Adversarial Networks (GANs) have long held a position of prominence. However, diffusion models have recently emerged, with results much beyond those achieved by GANs. This study offers a concise overview of text-to-image generative models by examining the existing body of literature and providing a deeper understanding of this topic. This will be accomplished by providing a concise summary of the development of text-to-image synthesis, previous tools and systems employed in this field, key types of generative models, as well as an exploration of the relevant research conducted on GANs and diffusion models. Additionally, the study provides an overview of common datasets utilized for training the text-to-image model, compares the evaluation metrics used for evaluating the models, and addresses the challenges encountered in the field. Finally, concluding remarks are provided to summarize the findings and implications of the study and open issues for further research.
- Published
- 2024
- Full Text
- View/download PDF
22. GACnet-Text-to-Image Synthesis With Generative Models Using Attention Mechanisms With Contrastive Learning
- Author
-
Md. Ahsan Habib, Md. Anwar Hussen Wadud, Lubna Yeasmin Pinky, Mehedi Hasan Talukder, Mohammad Motiur Rahman, M. F. Mridha, Yuichi Okuyama, and Jungpil Shin
- Subjects
Text-to-image synthesis ,generative adversarial networks ,C-GAN ,attention mechanism ,contrastive learning technique ,consistency ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The generation of high-quality images from textual descriptions is a challenging task in computer vision and natural language processing. The goal of text-to-image synthesis, a current topic of research, is to produce excellent images from written descriptions. This study proposes a hybrid approach to evaluating a dataset consisting of various text-image pairs by efficiently combining conditional generative adversarial networks (C-GAN), attention mechanisms, and contrastive learning (C-GAN+ATT+CL). We suggest a two-step method to improve image quality that starts by utilizing generative adversarial networks (GANs) with attention mechanisms to create low-resolution images and then contrastive learning to improve. Contrastive learning modules train on a separate dataset of high-resolution pictures; GANs learn on datasets of low-resolution text and image pairs. The Conditional GAN with Attention Mechanism and Contrastive Learning Method provides state-of-the-art performance in terms of image quality, diversity, and visual realism, among the several methods. The results of this study demonstrate that the proposed approach works better than all other methods, achieving an Inception Score (IS) of 35.23, a Fréchet Inception Distance (FID) of 18.2, and an R-Precision of 89.14. Our findings demonstrate that our “C-GAN+ATT+CL” approach significantly improves image quality and diversity and offers exciting paths for further study.
- Published
- 2024
- Full Text
- View/download PDF
23. A survey of generative adversarial networks and their application in text-to-image synthesis
- Author
-
Wu Zeng, Heng-liang Zhu, Chuan Lin, and Zheng-ying Xiao
- Subjects
cross-modal ,generate adversarial networks ,text-to-image synthesis ,deep learning ,Mathematics ,QA1-939 ,Applied mathematics. Quantitative methods ,T57-57.97 - Abstract
With the continuous development of science and technology (especially computational devices with powerful computing capabilities), the image generation technology based on deep learning has also made significant achievements. Most cross-modal technologies based on deep learning can generate information from text into images, which has become a hot topic of current research. Text-to-image (T2I) synthesis technology has applications in multiple fields of computer vision, such as image enhancement, artificial intelligence painting, games and virtual reality. The T2I generation technology using generative adversarial networks can generate more realistic and diverse images, but there are also some shortcomings and challenges, such as difficulty in generating complex backgrounds. This review will be introduced in the following order. First, we introduce the basic principles and architecture of basic and classic generative adversarial networks (GANs). Second, this review categorizes T2I synthesis methods into four main categories. There are methods based on semantic enhancement, methods based on progressive structure, methods based on attention and methods based on introducing additional signals. We have chosen some of the classic and latest T2I methods for introduction and explain their main advantages and shortcomings. Third, we explain the basic dataset and evaluation indicators in the T2I field. Finally, prospects for future research directions are discussed. This review provides a systematic introduction to the basic GAN method and the T2I method based on it, which can serve as a reference for researchers.
- Published
- 2023
- Full Text
- View/download PDF
24. DE-GAN: Text-to-image synthesis with dual and efficient fusion model.
- Author
-
Jiang, Bin, Zeng, Weiyuan, Yang, Chao, Wang, Renjun, and Zhang, Bolin
- Abstract
Generating diverse and plausible images conditioned on the given captions is an attractive but challenging task. While many existing studies have presented impressive results, text-to-image synthesis still suffers from two problems. (1) The fact that noise is only injected at the very beginning hurts the divesity of final results. (2) Most previous models exploit non-local-like spatial attention mechanisms to introduce fine-grained word-level information in the generation process, which makes these models too storage-consuming to apply to mobile and embedded applications. In this paper, we propose a novel Dual and Efficient Fusion Generative Adversarial Newtwork (DE-GAN) to cope with the issues above. To balance the diversity and fidelity of generated images, DE-GAN utilizes Dual Injection Blocks to simultaneously inject noise and text embeddings into the model multiple times during the generation process. In addition, an efficient condition channel attention module is designed in DE-GAN to capture the correlations between text and image modalities to guide the network in refining image features with as little storage overhead as possible, enabling the model to adapt to resource-constrained applications. Comprehensive experiments on two benchmark datasets demonstrate that DE-GAN efficiently generates more diverse and photo-realistic images compared to previous methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Multimodal Generative Model Based Text-to-Image Synthesis.
- Author
-
Nang Kham Htwe and Win Pa Pa
- Subjects
GENERATIVE adversarial networks - Abstract
Text-to-image synthesis (T2I) is a challenging task because the model requires to create high-quality images that are semantically consistent and realistic. Therefore, the main objective of this paper is to improve the quality of generated images and the similarity level between text descriptions and these images. In this paper, we proposed deepfusion generative adversarial networks (DF-GAN) with multimodal similarity model (MSM) to generate highresolution images with better consistency between text and the generated images. In this work, MSM is pretrained using real images with captions in the dataset. This pretrained model is used to improve the visual-semantic consistency level during training of T2I. This paper investigates the improvement in the image generation process due to applying MSM model to the generator. The investigation is performed on two different datasets with different languages to prove our proposed model outperforms baseline DF-GAN. Firstly, the experiment is done on Caltech-Birds dataset and the evaluative results of the proposed model are compared with state-of-the-art models, StackGAN, StackGAN++, AttnGAN, DMGAN, DAE-GAN, TIME, DF-GAN. According to the comparative results, the proposed model outperforms the baseline state-of-the-art models in terms of Fréchet inception distance (FID) score and inception score. The improvements of the proposed model on synthesizing images over the baseline models are proved in terms of image quality and visual-semantic similarity in this work. Accordingly, the proposed model is applied on Myanmar text-to-image synthesis (Myanmar T2I) with Oxford 102 flowers dataset annotated in Myanmar to prove the effectiveness of the proposed model on different dataset and different language. To the extent of our knowledge, this is the first attempt of implementing generative adversarial networks with multimodal in Myanmar T2I and it got inception score of 3.54 ± 0.03 and FID score of 49.97. In Myanmar T2I, the proposed model also got better performance than the baseline DF-GAN because it achieved higher inception score and lower FID score than the baseline model. Drawing upon the findings derived from the outcomes of the experiments, it is evident that the proposed model demonstrates an enhancement in the quality of generated images within the context of the image generation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. 行列式点过程采样的文本生成图像方法.
- Author
-
李晓霖, 李 刚, 张恩琪, and 顾广华
- Subjects
- *
GENERATIVE adversarial networks , *POINT processes - Abstract
Objectives: In recent years, a great breakthrough has been made in the text generation image problem based on generative adversarial networks (GAN). It can generate corresponding images based on the semantic information of the text, and has great application value. However, the current generated image results usually lack specific texture details, and often have problems such as collapsed modes and lack of diversity. Methods: This paper proposes a determinant point process for generative adversarial networks(GAN-DPP) to improve the quality of the generated samples, and uses two baseline models, Stack-GAN++ and ControlGAN, to implement GAN-DPP. During the training, it uses determinantal point process kernel to model the diversity of real data and synthetic data and encourages the generator to generate diversity data similar to the real data through penalty loss. It improves the clarity and diversity of generated samples, and reduces problems such as mode collapse. No extra calculations were added during training. Results: This paper compares the generated results through indicators. For the inception score, a high value indicates that the image clarity and diversity have improved. On the Oxford-102 dataset, the score of GAN-DPP-S is increased by 3.1% compared with StackGAN++, and the score of GAN-DPPC is 3.4% higher than that of ControlGAN. For the CUB dataset, the score of GAN-DPP-S increased by 8.2%, and the score of GAN-DPP-C increased by 1.9%. For the Fréchet Inception Distance score, the lower the value, the better the quality of image generation. On the Oxford-102 dataset, the score of GANDPP-S is reduced by 11.1%, and the score of GAN-DPP-C is reduced by 11.2%. For the CUB dataset, the score of GAN-DPP-S is reduced by 6.4%, and the score of GAN-DPP-C is reduced by 3.1%. Conclusions: The qualitative and quantitative comparative experiments prove that the proposed GAN-DPP method improves the performance of the generative confrontation network model. The image texture details generated by the model are more abundant, and the diversity is significantly improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Prior knowledge guided text to image generation.
- Author
-
Liu, An-An, Sun, Zefang, Xu, Ning, Kang, Rongbao, Cao, Jinbo, Yang, Fan, Qin, Weijun, Zhang, Shenyuan, Zhang, Jiaqi, and Li, Xuanya
- Subjects
- *
PRIOR learning , *AFFINE transformations , *GENERATIVE adversarial networks - Abstract
Generating a realistic and semantically consistent image from a given text is a challenging task. Due to the limited information of natural language, it is difficult to generate vivid images with fine details. To address this problem, we propose a Prior Knowledge Guided GAN for text to image generation. Specifically, the proposed method consists of several Knowledge Guided Up-Blocks. We decompose the image into a superposition of several visual regions, each of which requires corresponding prior knowledge to enrich its visual details. Correspondingly, we construct each Up-Block by incorporating relevant prior knowledge as input, aiming to enhance the quality of each visual region. Prior knowledge progressively provides more visual detail through affine transformations. Finally, high-quality images are synthesized by fusing all image regions. Experimental results on the CUB and COCO datasets demonstrate the superior performance of the proposed method. • We decompose the image into a superposition of several visual regions. • We provide related prior knowledge as input to enhance the quality of each visual regions. • We help establish the semantic mapping between the knowledge captions and the corresponding image regions. • Extensive experiments on CUB and COCO datasets demonstrate the superior performance of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Auditing and instructing text-to-image generation models on fairness
- Author
-
Friedrich, Felix, Brack, Manuel, Struppek, Lukas, Hintersdorf, Dominik, Schramowski, Patrick, Luccioni, Sasha, and Kersting, Kristian
- Published
- 2024
- Full Text
- View/download PDF
29. ChatGPT: Inside and Impact on Business Automation
- Author
-
Huang, Ken, Xing, Chunxiao, Huang, Ken, editor, Wang, Yang, editor, Zhu, Feng, editor, Chen, Xi, editor, and Xing, Chunxiao, editor
- Published
- 2023
- Full Text
- View/download PDF
30. Fine-Grained Face Sketch-Photo Synthesis with Text-Guided Diffusion Models
- Author
-
Liu, Jin, Huang, Huaibo, Cao, Jie, Duan, Junxian, He, Ran, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Huimin, editor, Blumenstein, Michael, editor, Cho, Sung-Bae, editor, Liu, Cheng-Lin, editor, Yagi, Yasushi, editor, and Kamiya, Tohru, editor
- Published
- 2023
- Full Text
- View/download PDF
31. Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN
- Author
-
Rani, Poonam, Kumar, Devender, Sudhakar, Nupur, Prakash, Deepak, Shubham, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Hassanien, Aboul Ella, editor, Castillo, Oscar, editor, Anand, Sameer, editor, and Jaiswal, Ajay, editor
- Published
- 2023
- Full Text
- View/download PDF
32. A survey of generative adversarial networks and their application in text-to-image synthesis.
- Author
-
Zeng, Wu, Zhu, Heng-liang, Lin, Chuan, and Xiao, Zheng-ying
- Subjects
- *
DEEP learning , *CONVOLUTIONAL neural networks , *ARTIFICIAL intelligence , *SEMANTICS , *COMPUTER vision - Abstract
With the continuous development of science and technology (especially computational devices with powerful computing capabilities), the image generation technology based on deep learning has also made significant achievements. Most cross-modal technologies based on deep learning can generate information from text into images, which has become a hot topic of current research. Text-to-image (T2I) synthesis technology has applications in multiple fields of computer vision, such as image enhancement, artificial intelligence painting, games and virtual reality. The T2I generation technology using generative adversarial networks can generate more realistic and diverse images, but there are also some shortcomings and challenges, such as difficulty in generating complex backgrounds. This review will be introduced in the following order. First, we introduce the basic principles and architecture of basic and classic generative adversarial networks (GANs). Second, this review categorizes T2I synthesis methods into four main categories. There are methods based on semantic enhancement, methods based on progressive structure, methods based on attention and methods based on introducing additional signals. We have chosen some of the classic and latest T2I methods for introduction and explain their main advantages and shortcomings. Third, we explain the basic dataset and evaluation indicators in the T2I field. Finally, prospects for future research directions are discussed. This review provides a systematic introduction to the basic GAN method and the T2I method based on it, which can serve as a reference for researchers. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
33. 基于有监督对比学习的航天信息获取与图像生成.
- Author
-
齐翌辰 and 赵伟超
- Subjects
CLASSIFICATION - Abstract
Copyright of Chinese Journal of Liquid Crystal & Displays is the property of Chinese Journal of Liquid Crystal & Displays and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
34. Word self-update contrastive adversarial networks for text-to-image synthesis.
- Author
-
Xiao, Jian, Sun, Yiwen, and Bi, Xiaojun
- Subjects
- *
COMPUTER vision , *NEW words , *VOCABULARY , *EYE tracking - Abstract
Synthesizing realistic fine-grained images from text descriptions is a significant computer vision task. Although many GANs-based methods have been proposed to solve this task, generating high-quality images consistent with text information remains a difficult problem. These existing GANs-based methods ignore important words due to the use of fixed initial word features in generator, and neglect to learn semantic consistency between images and texts for discriminators. In this article, we propose a novel attentional generation and contrastive adversarial framework for fine-grained text-to-image synthesis, termed as Word Self-Update Contrastive Adversarial Networks (WSC-GAN). Specifically, we introduce a dual attention module for modeling color details and semantic information. With a new designed word self-update module, the generator can leverage visually important words to compute attention maps in the feature synthesis module. Furthermore, we contrive multi-branch contrastive discriminators to maintain better consistency between the generated image and text description. Two novel contrastive losses are proposed for our discriminators to impose image-sentence and image-word consistency constraints. Extensive experiments on CUB and MS-COCO datasets demonstrate that our method achieves better performance compared with state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Survey About Generative Adversarial Network and Text-to-Image Synthesis.
- Author
-
LAI Li'na, MI Yu, ZHOU Longlong, RAO Jiyong, XU Tianyang, and SONG Xiaoning
- Subjects
GENERATIVE adversarial networks ,DEEP learning - Abstract
With the popularity of multi-sensors, multi-modal data has received continuous attention from scientific research and industry. The technology of processing multi- source modal information through deep learning is the core. Text-to-image generation is one of the directions of multi-modal technology. Because the images generated by generative adversarial network (GAN) are more realistic, the generation of text images has made excellent progress. It can be used in many fields such as image editing and colorization, style transfer, object deformation, and photo enhancement, etc. In this review, GAN networks based on image generation function are divided into four categories: semantic- enhanced GAN, growth-able GAN, diversity-enhanced GAN, and intelligence-enhanced GAN. According to the direction provided by the taxonomy, the function-based text image generation models are integrated and compared to clarify the context. The existing evaluation indicators and commonly used data sets are analyzed, and the feasibility and future development trend of complex text processing are clarified. This review systematically complements the analysis of generative adversarial networks in text image generation and will help researchers further advance this field. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Enhanced Text-to-Image Synthesis With Self-Supervision
- Author
-
Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, and Jit Yan Lim
- Subjects
Text-to-image synthesis ,generative model ,GAN ,self-supervised learning ,generative adversarial networks ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.
- Published
- 2023
- Full Text
- View/download PDF
37. Text-Guided Image Manipulation via Generative Adversarial Network With Referring Image Segmentation-Based Guidance
- Author
-
Yuto Watanabe, Ren Togo, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama
- Subjects
Text-guided image manipulation ,text-to-image synthesis ,generative adversarial network ,referring image segmentation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This study proposes a novel text-guided image manipulation method that introduces referring image segmentation into a generative adversarial network. The proposed text-guided image manipulation method aims to manipulate images containing multiple objects while preserving text-unrelated regions. The proposed method assigns the task of distinguishing between text-related and unrelated regions in an image to segmentation guidance based on referring image segmentation. With this architecture, the adversarial generative network can focus on generating new attributes according to the text description and reconstructing text-unrelated regions. For the challenging input images with multiple objects, the experimental results demonstrate that the proposed method outperforms conventional methods in terms of image manipulation precision.
- Published
- 2023
- Full Text
- View/download PDF
38. Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects
- Author
-
Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan Lim, and Ali Alqahtani
- Subjects
Text-to-image synthesis ,generative model ,GAN ,generative adversarial networks ,review ,survey ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Text-to-image synthesis is a fascinating area of research that aims to generate images based on textual descriptions. The main goal of this field is to generate images that match the given textual description in terms of both semantic consistency and image realism. While text-to-image synthesis has shown remarkable progress in recent years, it still faces several challenges, mainly related to the level of image realism and semantic consistency. To address these challenges, various approaches have been proposed, which mainly rely on Generative Adversarial Networks (GANs) for optimal performance. This paper provides a review of the existing text-to-image synthesis approaches, which are categorized into four groups: image realism, multiple scene, semantic enhancement, and style transfer. In addition to discussing the existing approaches, this paper also reviews the widely used datasets for text-to-image synthesis, including Oxford-102, CUB-200-2011, and COCO. The evaluation metrics used in this field are also discussed, including Inception Score, Fréchet Inception Distance, Structural Similarity Index, R-precision, Visual-Semantic Similarity, and Semantic Object Accuracy. The paper also offers a compilation of the performance of existing works in the field.
- Published
- 2023
- Full Text
- View/download PDF
39. Aberrant AI creations: co-creating surrealist body horror using the DALL-E Mini text-to-image generator.
- Author
-
O'Meara, Jennifer and Murphy, Cáit
- Subjects
SOCIAL media ,MEMES ,HORROR ,ARTIFICIAL intelligence ,IMAGE databases ,INTERNET users - Abstract
The emergence in 2022 of surreal and grotesque image sets created using the free online AI text-to-image generator DALL-E Mini (Craiyon) prompts our analysis of their aesthetic content and connections to preexisting media forms and trends in digital culture. DALL-E Mini uses an unfiltered database of images from the internet to create new images based on a user's text prompt, often resulting in misshapen bodies and impossible scenarios. Despite its technological limitations, DALL-E Mini's popularity as a meme-making tool is visible on social media platforms, where crowd-sourced images are shared and experimentation with the tool is encouraged. Through comparison with existing artistic practices and formats (creative automata, surrealism, body horror, celebrity memes), we argue that DALL-E Mini creations can be understood as human-AI co-creations and forms of aesthetic mimicry. Building on the ideas of surrealists such as André Breton, we propose that DALL-E Mini's images, prompts and the grid interface adhere to surrealism's historical interests in the unconscious, the uncanny, and the collaborative 'exquisite corpse' parlour game. We also consider DALL-E Mini's relevance to the category of 'AI Arts', Patricia De Vries's call for more research that relates algorithms to the broader artistic and cultural contexts in which they are embedded (2020), and the 'authoring' of celebrity bodies as data (Kanai, 2016). Our theorisation of DALL-E Mini is supported by examples drawn from social media and personal experiments with the generator. Overall, we propose that internet users' experimentation with DALL-E Mini corresponds with a cultural moment in which AI imaging technologies are eliciting excitement and anxiety. The outputs are revealed to be reliant on users' pop cultural knowledge, with DALL-E Mini allowing for a playful, co-creative algorithmic practice, wherein contemporary anxieties about digital labour, (post)digital culture, biopolitics, and global issues are redirected into surreal visual storyworlds. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Refining AttnGAN Using Attention on Attention Network
- Author
-
Bhise, Naitik, Krzyzak, Adam, Bui, Tien D., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Krzyzak, Adam, editor, Suen, Ching Y., editor, Torsello, Andrea, editor, and Nobile, Nicola, editor
- Published
- 2022
- Full Text
- View/download PDF
41. Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining
- Author
-
Li, Ruijun, Li, Weihua, Yang, Yi, Bai, Quan, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Khanna, Sankalp, editor, Cao, Jian, editor, Bai, Quan, editor, and Xu, Guandong, editor
- Published
- 2022
- Full Text
- View/download PDF
42. TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation
- Author
-
Dinh, Tan M., Nguyen, Rang, Hua, Binh-Son, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
- Published
- 2022
- Full Text
- View/download PDF
43. AttnGAN: Realistic Text-to-Image Synthesis with Attentional Generative Adversarial Networks
- Author
-
Mathesul, Shubham, Bhutkar, Ganesh, Rambhad, Ayush, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ardito, Carmelo, editor, Lanzilotti, Rosa, editor, Malizia, Alessio, editor, Larusdottir, Marta, editor, Spano, Lucio Davide, editor, Campos, José, editor, Hertzum, Morten, editor, Mentler, Tilo, editor, Abdelnour Nocera, José, editor, Piccolo, Lara, editor, Sauer, Stefan, editor, and van der Veer, Gerrit, editor
- Published
- 2022
- Full Text
- View/download PDF
44. Review on Generative Adversarial Neural Networks (GAN) in Text-to-Image Synthesis
- Author
-
Che Aminudin, Muhamad Faris, Suandi, Shahrel Azmin, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Mahyuddin, Nor Muzlifah, editor, Mat Noor, Nor Rizuan, editor, and Mat Sakim, Harsa Amylia, editor
- Published
- 2022
- Full Text
- View/download PDF
45. An efficient multi-path structure with staged connection and multi-scale mechanism for text-to-image synthesis.
- Author
-
Ding, Jiajun, Liu, Beili, Yu, Jun, Guo, Huanlei, Shen, Ming, and Shen, Kenong
- Subjects
- *
LONG short-term memory , *FEATURE extraction - Abstract
Generating a realistic image which matches the given text description is a challenging task. The multi-stage framework obtains the high-resolution image by constructing a low-resolution image firstly, which is widely adopted for text-to-image synthesis task. However, subsequent stages of existing generator have to construct the whole image repeatedly, while the primitive features of the objects have been sketched out in the previously adjacent stage. In order to make the subsequent stages focus on enriching fine-grained details and improve the quality of the final generated image, an efficient multi-path structure is proposed for multi-stage framework in this paper. The proposed structure contains two parts: staged connection and multi-scale module. Staged connection is employed to transfer the feature maps of the generated image from previously adjacent stage to the end of current stage. Such path can avoid the requirement of long-term memory and guide the network focus on modifying and supplementing the details of generated image. In addition, the multi-scale module is explored to extract feature at different scales and generate image with more fine-grained details. The proposed multi-path structure can be introduced to multi-stage based algorithm such as StackGAN-v2 and AttnGAN. Extensive experiments are conducted on two widely used datasets, i.e. Oxford-102 and CUB dataset, for the text-to-image synthesis task. The results demonstrate the superior performance of the methods with multi-path structure over the base models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Multi-scale dual-modal generative adversarial networks for text-to-image synthesis.
- Author
-
Jiang, Bin, Huang, Yun, Huang, Wei, Yang, Chao, and Xu, Fangqiang
- Subjects
GENERATIVE adversarial networks - Abstract
Generating images from text descriptions is a challenging task due to the natural gap between the textual and visual modalities. Despite the promising results of existing methods, they suffer from two limitations: (1) focus more on the image semantic information while fails to fully explore the texture information; (2) only consider to model the correlation between words and image with a fixed scale, thus decreases the diversity and discriminability of the network representations. To address above issues, we propose a Multi-scale Dual-modal Generative Networks (MD-GAN). The core components of MD-GAN are the dual-modal modulation attention (DMA) and the multi-scale consistency discriminator (MCD). The DMA includes two blocks: the textual guiding module that captures the correlation between images and text descriptions to rectify the image semantic content, and the channel sampling module that adjusts image texture by selectively aggregating the channel-wise information on spatial space. In addition, the MCD constructs the correlation between text and image region of various sizes, enhancing the semantic consistency between text and images. Extensive experiments on CUB and MS-COCO datasets show the superiority of MD-GAN over state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks.
- Author
-
Ku, Hyeeun and Lee, Minhyeok
- Subjects
GENERATIVE adversarial networks ,DATA augmentation ,INTERPOLATION spaces ,COMPUTER vision - Abstract
Generative adversarial networks (GANs) have demonstrated remarkable potential in the realm of text-to-image synthesis. Nevertheless, conventional GANs employing conditional latent space interpolation and manifold interpolation (GAN-CLS-INT) encounter challenges in generating images that accurately reflect the given text descriptions. To overcome these limitations, we introduce TextControlGAN, a controllable GAN-based model specifically designed for text-to-image synthesis tasks. In contrast to traditional GANs, TextControlGAN incorporates a neural network structure, known as a regressor, to effectively learn features from conditional texts. To further enhance the learning performance of the regressor, data augmentation techniques are employed. As a result, the generator within TextControlGAN can learn conditional texts more effectively, leading to the production of images that more closely adhere to the textual conditions. Furthermore, by concentrating the discriminator's training efforts on GAN training exclusively, the overall quality of the generated images is significantly improved. Evaluations conducted on the Caltech-UCSD Birds-200 (CUB) dataset demonstrate that TextControlGAN surpasses the performance of the cGAN-based GAN-INT-CLS model, achieving a 17.6% improvement in Inception Score (IS) and a 36.6% reduction in Fréchet Inception Distance (FID). In supplementary experiments utilizing 128 × 128 resolution images, TextControlGAN exhibits a remarkable ability to manipulate minor features of the generated bird images according to the given text descriptions. These findings highlight the potential of TextControlGAN as a powerful tool for generating high-quality, text-conditioned images, paving the way for future advancements in the field of text-to-image synthesis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. CAGAN: Text-To-Image Generation with Combined Attention Generative Adversarial Networks
- Author
-
Schulze, Henning, Yaman, Dogucan, Waibel, Alexander, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bauckhage, Christian, editor, Gall, Juergen, editor, and Schwing, Alexander, editor
- Published
- 2021
- Full Text
- View/download PDF
49. TRGAN: Text to Image Generation Through Optimizing Initial Image
- Author
-
Zhao, Liang, Li, Xinwei, Huang, Pingda, Chen, Zhikui, Dai, Yanqi, Li, Tianyu, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Mantoro, Teddy, editor, Lee, Minho, editor, Ayu, Media Anugerah, editor, Wong, Kok Wai, editor, and Hidayanto, Achmad Nizar, editor
- Published
- 2021
- Full Text
- View/download PDF
50. Image Stream From Paragraph Method Based on Scene Graph
- Author
-
ZHANG Wei-qi, TANG Yi-feng, LI Lin-yan, HU Fu-yuan
- Subjects
generative adversarial networks ,graph convolutional network ,scene layout ,text-to-image synthesis ,Computer software ,QA76.75-76.765 ,Technology (General) ,T1-995 - Abstract
The task of generating sequence images from paragraphs by generating confrontation networks can already generate higher quality images.However,when the input text involves multiple objects and relationships,the context information of the text sequence is difficult to extract,the object layout of the generated image is prone to confusion,and the generated object details are insufficient.To solve this problem,this paper proposes a method of generating sequence images based on scene graphs based on StoryGAN.First,the paragraph is converted into multiple scene graphs through graph convolution,each scene graph contains the object and relationship information of the corresponding text.Then,the bounding box and segmentation mask of the object are predicted to calculate the scene layout.Finally,according to the scene layout and the context information,a sequence of images more in line with the object and its relationship is generated.Tests on CLEVR-SV and CoDraw-SV data sets show that the me-thod in this paper can generate 64×64-pixel sequence images containing multiple objects and their relationships.Experimental results show that on the CLEVR-SV data set,the SSIM and FID of this method are improved by 1.34% and 9.49% respectively than StoryGAN.On the CoDraw-SV data set,the ACC of this method is 7.40% higher than that of StoryGAN.The proposed method improves the rationality of the layout of the generated scene,not only can generate an image sequence containing multiple objects and relationships,but also the generated image has higher quality and clearer details.
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.