1. Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval.
- Author
-
Xiong, Siyu, Pan, Lili, Ma, Xueqiang, Hu, Qinghua, and Beckman, Eric
- Abstract
Deep hashing cross-modal image-text retrieval has the advantage of low storage cost and high retrieval efficiency by mapping different modal data into a Hamming space. However, the existing unsupervised deep hashing methods generally relied on the intrinsic similarity information of each modal for structural matching, failing to fully consider the heterogeneous characteristics and semantic gaps of different modalities, which results in the loss of latent semantic correlation and co-occurrence information between the different modalities. To address this problem, this paper proposes an unsupervised deep hashing with multiple similarity preservation (UMSP) method for cross-modal image-text retrieval. First, to enhance the representation ability of the deep features of each modality, a modality-specific image-text feature extraction module is designed. Specifically, the image network with parallel structure and text network are constructed with the vision-language pre-training image encoder and multi-layer perceptron to capture the deep semantic information of each modality and learn a common hash code representation space. Then, to bridge the heterogeneous gap and improve the discriminability of hash codes, a multiple similarity preservation module is builded based on three perspectives: joint modal space, cross-modal hash space and image modal space, which aids the network to preserve the semantic similarity of modalities. Experimental results on three benchmark datasets (Wikipedia, MIRFlickr-25K and NUS-WIDE) show that UMSP outperforms other unsupervised methods for cross-modal image-text retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF