Back to Search Start Over

Indexing Earth Mover’s Distance over Network Metrics.

Authors :
Wang, Ting
Meng, Shicong
Bian, Jiang
Source :
IEEE Transactions on Knowledge & Data Engineering; Jun2015, Vol. 27 Issue 6, p1588-1601, 14p
Publication Year :
2015

Abstract

The Earth Mover’s Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for $L^p$<alternatives><inline-graphic xlink:type="simple" xlink:href="wang-ieq1-2373359.gif"/> </alternatives> feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering $k$<alternatives> <inline-graphic xlink:type="simple" xlink:href="wang-ieq2-2373359.gif"/></alternatives>-nearest neighbor ( $k$<alternatives><inline-graphic xlink:type="simple" xlink:href="wang-ieq3-2373359.gif"/> </alternatives>-NN) queries under network-based EMD metrics (NEMD). We propose <sc>Oasis</sc>, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, <sc>Oasis</sc> employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of $k$<alternatives> <inline-graphic xlink:type="simple" xlink:href="wang-ieq4-2373359.gif"/></alternatives>-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that <sc>Oasis</sc> significantly outperforms the state-of-the-art methods in query processing cost. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
10414347
Volume :
27
Issue :
6
Database :
Complementary Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
102387154
Full Text :
https://doi.org/10.1109/TKDE.2014.2373359