1. VBLSH: Volume-balancing locality-sensitive hashing algorithm for K-nearest neighbors search
- Author
-
Huixia Lai, Ruliang Xiao, Xinhong Lin, Lulu Zhang, Weilin Chen, and Shi Zhang
- Subjects
Information Systems and Management ,Optimization problem ,Computer science ,Hash function ,Computer Science Applications ,Theoretical Computer Science ,k-nearest neighbors algorithm ,Locality-sensitive hashing ,Set (abstract data type) ,Artificial Intelligence ,Control and Systems Engineering ,Algorithmic efficiency ,Hypercube ,Precision and recall ,Algorithm ,Computer Science::Databases ,Software - Abstract
K-nearest neighbors search (K-NNS) is a fundamental problem in many areas of machine learning and data mining. In an attempt to solve NNS problems by locality-sensitive hashing (LSH)-based algorithms and avoid the F1-trap, in this paper, we theoretically prove the necessity of maintaining close edges of each dimension, the independence of the partition location and the precision of retrieval. Based on the theories, a new method is proposed for constructing the hash function family. Then, a volume-balancing locality-sensitive hashing (VBLSH) algorithm is implemented. In this algorithm, the problem of constructing the hash function family is transformed into an optimization problem under a fixed number of hypercubes. By solving the optimization problem, different hyperplane intervals are set for each dimension to ensure that the hash functions can evenly segment the data space, where each segmented hypercube has a close volume and close side lengths. Finally, the analysis of the algorithm shows that the VBLSH algorithm has a high index coding efficiency and hash code calculation efficiency. The experimental results show that the VBLSH algorithm can simultaneously take into account the efficiency, precision and recall rate, and avoid the F1-trap; it shows excellent effectiveness on 4 datasets.
- Published
- 2022
- Full Text
- View/download PDF