Back to Search Start Over

Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks.

Authors :
Wu, Fei
Li, Shuaishuai
Gao, Guangwei
Ji, Yimu
Jing, Xiao-Yuan
Wan, Zhiguo
Source :
Pattern Recognition. Apr2023, Vol. 136, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

• MCGCN for the first time builds cross-modal graph and jointly learns modality-specific and modality-shared features for semi-supervised cross-modal hashing. • MCGCN provides a three-channel network architecture, including two modality-specific channels and a cross-modal channel to model cross-modal graph with heterogeneous image and text features. • To effectively reduce the modality gap, network training is guided by adversarial scheme. • MCGCN obtains state-of-the-art semi-supervised cross-modal hashing performance. Cross-modal hashing maps heterogeneous multimedia data into Hamming space for retrieving relevant samples across modalities, which has received great research interests due to its rapid retrieval and low storage cost. In real-world applications, due to high manual annotation cost of multi-media data, we can only make use of limited number of labeled data with rich unlabeled data. In recent years, several semi-supervised cross-modal hashing (SCH) methods have been presented. However, how to fully explore and jointly utilize the modality-specific (complementarity) and modality-shared (correlation) information for retrieval has not been well studied for existing SCH works. In this paper, we propose a novel SCH approach named Modality-specific and Cross-modal Graph Convolutional Networks (MCGCN). The network architecture contains two modality-specific channels and a cross-modal channel to learn modality-specific and shared representations for each modality, respectively. Graph convolutional network (GCN) is leveraged in these three channels to explore intra-modal and inter-modal similarity, and perform semantic information propagation from labeled data to unlabeled data. Modality-specific and shared representations for each modality are fused with attention scheme. To further reduce the modality gap, a discriminative model is designed, learning to classify the modality of representations, and network training is guided by adversarial scheme. Experiments on two widely used multi-modal datasets demonstrate MCGCN outperforms state-of-the-art semi-supervised/supervised cross-modal hashing methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00313203
Volume :
136
Database :
Academic Search Index
Journal :
Pattern Recognition
Publication Type :
Academic Journal
Accession number :
161280465
Full Text :
https://doi.org/10.1016/j.patcog.2022.109211