Back to Search
Start Over
An end-to-end image-text matching approach considering semantic uncertainty.
- Source :
-
Neurocomputing . Nov2024, Vol. 607, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- We propose a novel end-to-end image-text matching approach considering semantic uncertainty (SU-ITM), aiming to deal with the one-to-many semantic diversity involved in image-text matching in order to capture the associations between them more comprehensively and improve the robustness of the model. Traditional methods map images and texts as definite points in an embedding space to measure cross-modal similarity. However, the point-based embedding cannot capture the semantic uncertainty, leading to a large bias in the matching results. To address this problem, we model the one-to-many associations between image and text in a way that establishes a probability distribution, incorporating the uncertainty information into the final semantic representation of the text. In addition, we optimize the image-text matching loss so that the different text features approximate the image features in a distributed manner while maintaining the discriminative nature of the semantic representation, effectively reducing the matching uncertainty. Notably, our method achieves end-to-end training by not using pre-trained target detection branches throughout the training process. We fully demonstrate the excellent performance of our method in the image-text matching task through experimental validation on Flickr30k and MSCOCO. Excellent performance levels of 546.1 and 545.0 are achieved on the R@SUM metric for Flickr30k and MSCOCO 1k, respectively. [ABSTRACT FROM AUTHOR]
- Subjects :
- *DISTRIBUTION (Probability theory)
Subjects
Details
- Language :
- English
- ISSN :
- 09252312
- Volume :
- 607
- Database :
- Academic Search Index
- Journal :
- Neurocomputing
- Publication Type :
- Academic Journal
- Accession number :
- 179499499
- Full Text :
- https://doi.org/10.1016/j.neucom.2024.128386