Back to Search Start Over

Deep Adversarial Learning Triplet Similarity Preserving Cross-Modal Retrieval Algorithm

Authors :
Guokun Li
Zhen Wang
Shibo Xu
Chuang Feng
Xiaohan Yang
Nannan Wu
Fuzhen Sun
Source :
Mathematics, Vol 10, Iss 15, p 2585 (2022)
Publication Year :
2022
Publisher :
MDPI AG, 2022.

Abstract

The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. However, inconsistent distribution and diverse representation make it hard to directly measure the similarity relationship between different modal samples, which causes a heterogeneity gap. To bridge the above-mentioned gap, we propose the deep adversarial learning triplet similarity preserving cross-modal retrieval algorithm to map different modal samples into the common space, allowing their feature representation to preserve both the original inter- and intra-modal semantic similarity relationship. During the training process, we employ GANs, which has advantages in modeling data distribution and learning discriminative representation, in order to learn different modal features. As a result, it can align different modal feature distributions. Generally, many cross-modal retrieval algorithms only preserve the inter-modal similarity relationship, which makes the nearest neighbor retrieval results vulnerable to noise. In contrast, we establish the triplet similarity preserving function to simultaneously preserve the inter- and intra-modal similarity relationship in the common space and in each modal space, respectively. Thus, the proposed algorithm has a strong robustness to noise. In each modal space, to ensure that the generated features have the same semantic information as the sample labels, we establish a linear classifier and require that the generated features’ classification results be consistent with the sample labels. We conducted cross-modal retrieval comparative experiments on two widely used benchmark datasets—Pascal Sentence and Wikipedia. For the image to text task, our proposed method improved the mAP values by 1% and 0.7% on the Pascal sentence and Wikipedia datasets, respectively. Correspondingly, the proposed method separately improved the mAP values of the text to image performance by 0.6% and 0.8% on the Pascal sentence and Wikipedia datasets, respectively. The experimental results show that the proposed algorithm is better than the other state-of-the-art methods.

Details

Language :
English
ISSN :
22277390
Volume :
10
Issue :
15
Database :
Directory of Open Access Journals
Journal :
Mathematics
Publication Type :
Academic Journal
Accession number :
edsdoj.ba5cb688eff84372aad468d38059df6a
Document Type :
article
Full Text :
https://doi.org/10.3390/math10152585