Back to Search
Start Over
TmfimCLIP: Text-Driven Multi-Attribute Face Image Manipulation.
- Source :
-
International Journal of Image & Graphics . Dec2024, p1. 21p. - Publication Year :
- 2024
-
Abstract
- Text-to-image conversion has garnered significant research attention, with contemporary methods leveraging the latent space analysis of StyleGAN. However, issues with latent code decoupling, interpretability, and controllability often remain, leading to misaligned image attributes. To address these challenges, we propose a refined approach that segments StyleGAN’s latent code using the Visual Language Model (CLIP). Our method aligns the latent code segments with text embeddings via an image-text alignment module and modulates them through a text injection module. Additionally, we incorporate semantic segmentation loss and mouth loss to constrain operations that affect irrelevant attributes. Compared to previous CLIP-driven techniques, our approach significantly enhances decoupling, interpretability, and controllability. Experiments on the CelebA-HQ and FFHQ datasets validate our model’s efficacy through both qualitative and quantitative comparisons. Our model effectively handles a wide range of style variations, achieving an FID score of 21.15 for facial attributes and an ID metric of 0.88 for hair attributes. [ABSTRACT FROM AUTHOR]
- Subjects :
- *LANGUAGE models
*HAIR
Subjects
Details
- Language :
- English
- ISSN :
- 02194678
- Database :
- Academic Search Index
- Journal :
- International Journal of Image & Graphics
- Publication Type :
- Academic Journal
- Accession number :
- 181502374
- Full Text :
- https://doi.org/10.1142/s0219467827500069