Back to Search Start Over

Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval.

Authors :
FEIFEI ZHANG
MINGLIANG XU
CHANGSHENG XU
Source :
ACM Transactions on Multimedia Computing, Communications & Applications; Apr2022, Vol. 18 Issue 2, p365-387, 23p
Publication Year :
2022

Abstract

Composing Text and Image to Image Retrieval (CTI-IR) is an emerging task in computer vision, which allows retrieving images relevant to a query image with text describing desired modifications to the query image. Most conventional cross-modal retrieval approaches usually take one modality data as the query to retrieve relevant data of another modality. Different from the existing methods, in this article, we propose an endto-end trainable network for simultaneous image generation and CTI-IR. The proposed model is based on Generative Adversarial Network (GAN) and enjoys several merits. First, it can learn a generative and discriminative feature for the query (a query image with text description) by jointly training a generative model and a retrieval model. Second, our model can automatically manipulate the visual features of the reference image in terms of the text description by the adversarial learning between the synthesized image and target image. Third, global-local collaborative discriminators and attention-based generators are exploited, allowing our approach to focus on both the global and local differences between the query image and the target image. As a result, the semantic consistency and fine-grained details of the generated images can be better enhanced in our model. The generated image can also be used to interpret and empower our retrieval model. Quantitative and qualitative evaluations on three benchmark datasets demonstrate that the proposed algorithm performs favorably against state-of-the-art methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15516857
Volume :
18
Issue :
2
Database :
Complementary Index
Journal :
ACM Transactions on Multimedia Computing, Communications & Applications
Publication Type :
Academic Journal
Accession number :
155947284
Full Text :
https://doi.org/10.1145/3478642