Back to Search Start Over

X&Fuse: Fusing Visual Information in Text-to-Image Generation

Authors :
Kirstain, Yuval
Levy, Omer
Polyak, Adam
Publication Year :
2023

Abstract

We introduce X&Fuse, a general approach for conditioning on visual information when generating images from text. We demonstrate the potential of X&Fuse in three different text-to-image generation scenarios. (i) When a bank of images is available, we retrieve and condition on a related image (Retrieve&Fuse), resulting in significant improvements on the MS-COCO benchmark, gaining a state-of-the-art FID score of 6.65 in zero-shot settings. (ii) When cropped-object images are at hand, we utilize them and perform subject-driven generation (Crop&Fuse), outperforming the textual inversion method while being more than x100 faster. (iii) Having oracle access to the image scene (Scene&Fuse), allows us to achieve an FID score of 5.03 on MS-COCO in zero-shot settings. Our experiments indicate that X&Fuse is an effective, easy-to-adapt, simple, and general approach for scenarios in which the model may benefit from additional visual information.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2303.01000
Document Type :
Working Paper