1. Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model
- Author
-
Mingyong Li, Lirong Tang, Degang Yang, Shuang Peng, Qiqi Li, and Yan Ma
- Subjects
Databases, Factual ,Article Subject ,General Computer Science ,Scale (ratio) ,Information Management ,Computer science ,General Mathematics ,Computer applications to medicine. Medical informatics ,Hash function ,R858-859.7 ,Information Storage and Retrieval ,Neurosciences. Biological psychiatry. Neuropsychiatry ,02 engineering and technology ,Data retrieval ,Discriminative model ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,business.industry ,General Neuroscience ,Pattern recognition ,General Medicine ,Construct (python library) ,Semantics ,Modal ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Binary code ,Artificial intelligence ,business ,Research Article ,RC321-571 - Abstract
Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.
- Published
- 2021