Back to Search Start Over

Unconventional application of k-means for distributed approximate similarity search.

Authors :
Ortega, Felipe
Algar, Maria Jesus
de Diego, Isaac Martín
Moguerza, Javier M.
Source :
Information Sciences. Jan2023, Vol. 620, p208-234. 27p.
Publication Year :
2023

Abstract

• This paper presents MASK, a multilevel algorithm for approximate similarity search • MASK can distribute the index over as many computing nodes as we can afford. • Experimental results show the applicability of this novel indexing method • MASK achieves superior performance with high-dimensional and high-sparsity datasets. Similarity search based on a distance function in metric spaces is a fundamental problem for many applications. Queries for similar objects lead to the well-known machine learning task of nearest-neighbours identification. Many data indexing strategies, collectively known as Metric Access Methods (MAM), have been proposed to speed up these queries. Moreover, since exact approaches to solving similarity queries can be complex and time-consuming, alternative options have emerged to reduce query execution time, such as returning approximate results or resorting to distributed computing platforms. In this paper, we introduce MASK (Multilevel Approximate Similarity search with k -means), an unconventional application of the k -means algorithm as the foundation of a multilevel index structure for approximate similarity search suitable for metric spaces. We show that this method leverages inherent properties of k -means for this purpose, like representing high-density data areas with fewer prototypes. An implementation of this new indexing procedure is evaluated using a synthetic dataset and two real-world datasets in high-dimensional and high-sparsity spaces. Experimental tests show that MASK performs better than alternative algorithms for approximate similarity search. Results are promising and underpin the applicability of this novel indexing method in multiple domains. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
620
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
160632029
Full Text :
https://doi.org/10.1016/j.ins.2022.11.024