Back to Search
Start Over
Cross-modality earth mover's distance-driven convolutional neural network for different-modality data.
- Source :
-
Neural Computing & Applications . Jul2020, Vol. 32 Issue 13, p9581-9592. 12p. - Publication Year :
- 2020
-
Abstract
- Cross-modality matching refers to the problem of comparing similarity/dissimilarity of a pair of data points of different modalities, such as an image and a text. Deep neural networks have been popular to represent data points of different modalities due to their ability to extract effective features. However, existing works use simple distance metrics to compare the deep features of multiple modalities, which do not fit the nature of cross-modality matching, because it imposes the features of different modalities to be of the same dimension and do not allow cross-feature matching. To solve this problem, we propose to use convolutional neural network (CNN) models with soft-max activation layer to represent a pair of different-modality data points to two histograms (not necessarily of the same dimensions) and compare their dissimilarity by using earth mover's distance (EMD). The EMD can match the features extracted by the two CNN models of different modalities freely. Moreover, we develop a joint learning framework to learn the CNN parameters specifically for the EMD-driven comparison, supervised by the relevance/irrelevance labels of the data pairs of different modalities. The experiments over applications such as image–text retrieval, and malware detection show its advantage over existing cross-modality matching methods. [ABSTRACT FROM AUTHOR]
- Subjects :
- *CONVOLUTIONAL neural networks
*STORAGE & moving industry
Subjects
Details
- Language :
- English
- ISSN :
- 09410643
- Volume :
- 32
- Issue :
- 13
- Database :
- Academic Search Index
- Journal :
- Neural Computing & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 143783211
- Full Text :
- https://doi.org/10.1007/s00521-019-04471-8