Back to Search
Start Over
Enhancing Cross-Modal Retrieval Based on Modality-Specific and Embedding Spaces
- Source :
- IEEE Access, Vol 8, Pp 96777-96786 (2020)
- Publication Year :
- 2020
- Publisher :
- IEEE (Institute of Electrical and Electronics Engineers), 2020.
-
Abstract
- A new approach that drastically improves cross-modal retrieval performance in vision and language (hereinafter referred to as & x201C;vision and language retrieval & x201D;) is proposed in this paper. Vision and language retrieval takes data of one modality as a query to retrieve relevant data of another modality, and it enables flexible retrieval across different modalities. Most of the existing methods learn optimal embeddings of visual and lingual information to a single common representation space. However, we argue that the forced embedding optimization results in loss of key information for sentences and images. In this paper, we propose an effective utilization of representation spaces in a simple but robust vision and language retrieval method. The proposed method makes use of multiple individual representation spaces through text-to-image and image-to-text models. Experimental results showed that the proposed approach enhances the performance of existing methods that embed visual and lingual information to a single common representation space.
- Subjects :
- Generative adversarial networks
General Computer Science
Multimedia information retrieval
Computer science
Feature extraction
vision and language
02 engineering and technology
010501 environmental sciences
computer.software_genre
Semantics
01 natural sciences
cross-modal retrieval
0202 electrical engineering, electronic engineering, information engineering
Training
General Materials Science
Representation (mathematics)
0105 earth and related environmental sciences
Visualization
Modality (human–computer interaction)
business.industry
image-to-text model
General Engineering
Computational modeling
Gallium nitride
Key (cryptography)
Embedding
020201 artificial intelligence & image processing
Artificial intelligence
lcsh:Electrical engineering. Electronics. Nuclear engineering
text-to-image model
business
computer
lcsh:TK1-9971
Natural language processing
Subjects
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 8
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....968b46e628a6842387a8209c1061163c