Descriptor: "Nearest neighbor search" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Nearest neighbor search"' showing total 5,426 results

Start Over Descriptor "Nearest neighbor search"

5,426 results on '"Nearest neighbor search"'

1. Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing

Author: Song, Shang, Liu, Lin, Chen, Rongmao, Peng, Wei, Wang, Yi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tsudik, Gene, editor, Conti, Mauro, editor, Liang, Kaitai, editor, and Smaragdakis, Georgios, editor
Published: 2024
Full Text: View/download PDF

2. Revisiting Nearest-Neighbor-Based Information Set Decoding

Author: Esser, Andre, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Quaglia, Elizabeth A., editor
Published: 2024
Full Text: View/download PDF

3. Storage and Management of Ship Position Based on Geographic Grid Coding and Its Efficiency Analysis in Neighborhood Search—A Case Study of Shipwreck Rescue and Google S2.

Author: Jiang, Bohui, Zhou, Weifeng, and Han, Haibin
Subjects: COLLISIONS at sea, SHIPWRECKS, NEIGHBORHOODS, EUCLIDEAN distance, SHIPS, RESCUE work, FISHERIES
Abstract: The marine fishery is a high-risk industry, and when a fishing vessel is in distress at sea, it is essential to protect the lives of the fishermen by implementing a fast and effective rescue for the vessel in distress. In this process, quickly retrieving the closest neighboring vessel in distress based on an effective rescue time is critical to the vessel's rescue. In order to improve the time efficiency of retrieving the nearest available rescue vessels in the current sea area, this paper proposes a method based on the Google S2 grid coding for ships in distress at sea. The method is divided into three parts: (1) encoding the ship's position based on the Google S2 algorithm, (2) retrieving the set of available ships in the current sea area based on the effective rescue distance and its corresponding coding level, and (3) sorting the set of available ships in the current sea area according to the proximity to the ship in distress by using the method of "Alternating Grid Sorting Based on Neighborhoods and Different Coding Levels". The effective rescue distance is set by the rescue time, the type of rescue vessel, and the speed. This paper sets the simulation experiment area as the East China Sea area. Different magnitudes of ship position datasets (1 × 102, 1 × 103, 1 × 104, 1 × 105, 1 × 106) are generated by simulating the scenarios where a ship's location is reported or collected by AIS or VMS. The temporal retrieval efficiencies of querying based on the two methods, the Euclidean distance and the Google S2 grid encoding, are compared and analyzed. The experimental results show that the total time consumed by the query method based on the Google S2 grid encoding and the query method based on Euclidean distance is reduced by 37.06%, 29.83%, 72.75%, 94.43%, and 94.53%, respectively, in the process of generating the set of rescuable ships retrieved, based on the effective rescue distance. Therefore, the time retrieval efficiency of the maritime vessel search and rescue method based on the Google S2 coding is high, which can effectively improve the query efficiency of rescue vessels in the neighborhood of distressed vessels. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. 基于容忍因子的近似最近邻混合查询算法.

Author: 贺广福, 薛源海, 陈翠婷, 俞晓明, 刘欣然, and 程学旗
Abstract: Approximate nearest neighbor search (ANNS) is an important technique in the field of computer science for efficient similarity search, enabling fast information retrieval in large-scale datasets. With the increasing demand for high-precision information retrieval, there is a growing use of the hybrid query method that utilizes both structured and unstructured information for querying, which has broad application prospects. However, filtered greedy algorithms based on nearest neighbor graphs may reduce the connectivity of the graph due to the impact of structural constraints in hybrid queries, ultimately damaging search accuracy. This article proposes a tolerance factor based filtered greedy algorithm, which controls the participation of vertices that do not meet structural constraints in routing through a tolerance factor, maintaining the connectivity of the original nearest neighbor graph without changing the index structure, and overcoming the negative impact of structural constraints on retrieval accuracy. The experimental results demonstrate that the new method can achieve high-precision search for ANNS under different levels of structural constraints, while maintaining retrieval efficiency. This study solves the problem of ANNS based on nearest neighbor graphs in hybrid query scenarios, providing an effective solution for fast hybrid query information retrieval in large-scale datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. LFPeers: Temporal similarity search and result exploration.

Author: Sachdeva, Madhav, Burmeister, Jan, Kohlhammer, Jörn, and Bernard, Jürgen
Subjects: *CONCEPT learning, *VISUAL analytics, *BEHAVIORAL assessment, *COVID-19 pandemic, *PEERS, *NATURAL gas prospecting
Abstract: In this paper, we introduce a general concept for the analysis of temporal and multivariate data and the system LFPeers that applies this concept to temporal similarity search and results exploration. The conceptual workflow divides the analysis in two phases: a search phase to find the most similar objects to a query object before a time point t 0 in the temporal data, and an exploration phase to analyze and contextualize this subset of objects after t 0. LFPeers enables users to search for peers through interactive similarity search and filtering, explore interesting behavior of this peer group, and learn from peers through the assessment of diverging behaviors. We present the conceptual workflow to learn from peers and the LFPeers system with novel interfaces for search and exploration in temporal and multivariate data. An earlier workshop publication for LFPeers included a usage scenario targeting epidemiologists and the public who want to learn from the Covid-19 pandemic and distinguish successful and ineffective measures. In this extended paper, we now show how our concept is generalized and applied by domain experts in two case studies, including a novel case on stocks data. Finally, we reflect on the new state of development and on the insights gained by the experts in the case studies on the search and exploration of temporal data to learn from peers. [Display omitted] • Conceptual framework for temporal and multimodal similarity search and exploration. • Visual Analytics system for cause–effect analysis on temporal data. • Interactive user-defined search and exploration of data objects by similarity. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. 基于优先级队列的隧道无序点云快速法线全局定向方法.

Author: 李　纯, 黄新文, 周清华, 薛宇腾, 杨璟林, and 宋　浩
Abstract: Copyright of Railway Standard Design is the property of Railway Standard Design Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

7. LayerLSB:Nearest Neighbors Search Based on Layered Locality Sensitive B-tree

Author: DING Jiwen, LIU Zhuojin, WANG Jiaxing, ZHANG Yanfeng, YU Ge
Subjects: nearest neighbor search, layered structure, locality sensitive hashing, locality sensitive b-tree, Computer software, QA76.75-76.765, Technology (General), T1-995
Abstract: Nearest neighbor search has become a significant research problem due to its wide applications.Traditional spatial index structures such as R-tree and KD-tree can efficiently return accurate nearest neighbor search results in low-dimensional space,but they are not suitable for high-dimensional space.Locality sensitive B-tree(LSB) hashes data points to the sortable one-dimension values and arranges them in a tree-like structure,which dramatically improves the space and query efficiency of the previous locality sensitive hashing(LSH) implementations,without compromising the resulting quality.However,LSB fails to take data distribution into account.It performs well in a uniform data distribution setting,but exhibits unstable performance when the data are skewed.In response to this problem,this paper proposes LayerLSB,which reconstructs the hash values in a dense range by exploring the density of the hash values to make the distribution more uniform,so as to improve the query efficiency.Compared to LSB,LayerLSB indices become more targeted in terms of data distribution,and a multi-layered structure is constructed.Compared with the simple rehashing method,the multi-layered approach will still guarantee the search quality by carefully choosing the number of groups and hash functions.The results show that the query cost can be reduced to 44.6% of the original at most when achieving the same query accuracy.
Published: 2023
Full Text: View/download PDF

8. Storage and Management of Ship Position Based on Geographic Grid Coding and Its Efficiency Analysis in Neighborhood Search—A Case Study of Shipwreck Rescue and Google S2

Author: Bohui Jiang, Weifeng Zhou, and Haibin Han
Subjects: ship rescue, effective rescue time, gridding, ship position code, neighborhood search, nearest neighbor search, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: The marine fishery is a high-risk industry, and when a fishing vessel is in distress at sea, it is essential to protect the lives of the fishermen by implementing a fast and effective rescue for the vessel in distress. In this process, quickly retrieving the closest neighboring vessel in distress based on an effective rescue time is critical to the vessel’s rescue. In order to improve the time efficiency of retrieving the nearest available rescue vessels in the current sea area, this paper proposes a method based on the Google S2 grid coding for ships in distress at sea. The method is divided into three parts: (1) encoding the ship’s position based on the Google S2 algorithm, (2) retrieving the set of available ships in the current sea area based on the effective rescue distance and its corresponding coding level, and (3) sorting the set of available ships in the current sea area according to the proximity to the ship in distress by using the method of “Alternating Grid Sorting Based on Neighborhoods and Different Coding Levels”. The effective rescue distance is set by the rescue time, the type of rescue vessel, and the speed. This paper sets the simulation experiment area as the East China Sea area. Different magnitudes of ship position datasets (1 × 102, 1 × 103, 1 × 104, 1 × 105, 1 × 106) are generated by simulating the scenarios where a ship’s location is reported or collected by AIS or VMS. The temporal retrieval efficiencies of querying based on the two methods, the Euclidean distance and the Google S2 grid encoding, are compared and analyzed. The experimental results show that the total time consumed by the query method based on the Google S2 grid encoding and the query method based on Euclidean distance is reduced by 37.06%, 29.83%, 72.75%, 94.43%, and 94.53%, respectively, in the process of generating the set of rescuable ships retrieved, based on the effective rescue distance. Therefore, the time retrieval efficiency of the maritime vessel search and rescue method based on the Google S2 coding is high, which can effectively improve the query efficiency of rescue vessels in the neighborhood of distressed vessels.
Published: 2024
Full Text: View/download PDF

9. PLDH: Pseudo-Labels Based Deep Hashing.

Author: Liu, Huawen, Yin, Minhao, Wu, Zongda, Zhao, Liping, Li, Qi, Zhu, Xinzhong, and Zheng, Zhonglong
Subjects: *CONVOLUTIONAL neural networks, *DEEP learning, *IMAGE retrieval, *FEATURE selection
Abstract: Deep hashing has received a great deal of attraction in large-scale data analysis, due to its high efficiency and effectiveness. The performance of deep hashing models heavily relies on label information, which is very expensive to obtain. In this work, a novel end-to-end deep hashing model based on pseudo-labels for large-scale data without labels is proposed. The proposed hashing model consists of two major stages, where the first stage aims to obtain pseudo-labels based on deep features extracted by a pre-training deep convolution neural network. The second stage generates hash codes with high quality by the same neural network in the previous stage, coupled with an end-to-end hash layer, whose purpose is to encode data into a binary representation. Additionally, a quantization loss is introduced and interwound within these two stages. Evaluation experiments were conducted on two frequently-used image collections, CIFAR-10 and NUS-WIDE, with eight popular shallow and deep hashing models. The experimental results show the superiority of the proposed method in image retrieval. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic.

Author: Filtser, Arnold, Filtser, Omrit, and Katz, Matthew J.
Subjects: *DATA structures, *CURVES, *APPROXIMATE reasoning
Abstract: In the (1 + ε , r) -approximate near-neighbor problem for curves (ANNC) under some similarity measure δ , the goal is to construct a data structure for a given set C of curves that supports approximate near-neighbor queries: Given a query curve Q, if there exists a curve C ∈ C such that δ (Q , C) ≤ r , then return a curve C ′ ∈ C with δ (Q , C ′) ≤ (1 + ε) r . There exists an efficient reduction from the (1 + ε) -approximate nearest-neighbor problem to ANNC, where in the former problem the answer to a query is a curve C ∈ C with δ (Q , C) ≤ (1 + ε) · δ (Q , C ∗) , where C ∗ is the curve of C most similar to Q. Given a set C of n curves, each consisting of m points in d dimensions, we construct a data structure for ANNC that uses n · O (1 ε) md storage space and has O(md) query time (for a query curve of length m), where the similarity measure between two curves is their discrete Fréchet or dynamic time warping distance. Our method is simple to implement, deterministic, and results in an exponential improvement in both query time and storage space compared to all previous bounds. Further, we also consider the asymmetric version of ANNC, where the length of the query curves is k ≪ m , and obtain essentially the same storage and query bounds as above, except that m is replaced by k. Finally, we apply our method to a version of approximate range counting for curves and achieve similar bounds. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. DIOT: Detecting Implicit Obstacles from Trajectories

Author: Lei, Yifan, Huang, Qiang, Kankanhalli, Mohan, Tung, Anthony, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhattacharya, Arnab, editor, Lee Mong Li, Janice, editor, Agrawal, Divyakant, editor, Reddy, P. Krishna, editor, Mohania, Mukesh, editor, Mondal, Anirban, editor, Goyal, Vikram, editor, and Uday Kiran, Rage, editor
Published: 2022
Full Text: View/download PDF

12. 基于车载LiDAR点云的路边地上物多阶段聚类分割算法.

Author: 李康弘, 李永强, 李佳佳, 任京智, 郝道前, and 王治尧
Subjects: POINT cloud, STREET signs, STREET lighting, URBAN trees, SEARCH algorithms
Abstract: Copyright of Geography & Geographic Information Science is the property of Geography & Geo-Information Science Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

13. Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB.

Author: Banach, Mateusz
Subjects: *PROTEIN structure, *PROBABILITY density function, *ELLIPSOIDS, *PRINCIPAL components analysis, *OUTLIER detection, *DATABASES
Abstract: In this paper, we present an update to the ellipsoid profile algorithm (EP), a simple technique for the measurement of the globularity of protein structures without the calculation of molecular surfaces. The globularity property is understood in this context as the ability of the molecule to fill a minimum volume enclosing ellipsoid (MVEE) that approximates its assumed globular shape. The more of the interior of this ellipsoid is occupied by the atoms of the protein, the better are its globularity metrics. These metrics are derived from the comparison of the volume of the voxelized representation of the atoms and the volume of all voxels that can fit inside that ellipsoid (a uniform unit Å cube lattice). The so-called ellipsoid profile shows how the globularity changes with the distance from the center. Two of its values, the so-called ellipsoid indexes, are used to classify the structure as globular, semi-globular or non-globular. Here, we enhance the workflow of the EP algorithm via an improved outlier detection subroutine based on principal component analysis. It is capable of robust distinguishing between the dense parts of the molecules and, for example, disordered chain fragments fully exposed to the solvent. The PCA-based method replaces the current approach based on kernel density estimation. The improved EP algorithm was tested on 2124 representatives of domain superfamilies from SCOP 2.08. The second part of this work is dedicated to the survey of globularity of 3594 representatives of biological assemblies from molecules currently deposited in the PDB and analyzed by the 3DComplex database (monomers and complexes up to 60 chains). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. Locality sensitive hashing with bit selection.

Author: Zhou, Wenhua, Liu, Huawen, Lou, Jungang, and Chen, Xin
Subjects: BINARY codes, IMAGE retrieval
Abstract: Locality sensitive hashing (LSH), one of the most popular hashing techniques, has attracted considerable attention for nearest neighbor search in the field of image retrieval. It can achieve promising performance only if the number of the generated hash bits is large enough. However, more hash bits assembled to the binary codes contain massive redundant information and require more time cost and storage spaces. To alleviate this limitation, we propose a novel bit selection framework to pick important bits out of the hash bits generated by hashing techniques. Within the bit selection framework, we further exploit eleven evaluation criteria to measure the importance and similarity of each bit generated by LSH, so that the bits with high importance and less similarity are selected to assemble new binary codes. To demonstrate the effectiveness of the proposed framework of bit selection, we evaluated the proposed framework with the evaluation criteria on five commonly used data sets. Experimental results show the proposed bit selection framework works effectively in different cases, and the performance of LSH has not been degraded significantly after redundant hash bits reduced by the evaluation criteria. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

15. The Descriptiveness of Feature Descriptors with Reduced Dimensionality

Author: Varga, Dániel, Szalai-Gindl, János Márk, Laki, Sándor, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Bellatreche, Ladjel, editor, Dumas, Marlon, editor, Karras, Panagiotis, editor, Matulevičius, Raimundas, editor, Awad, Ahmed, editor, Weidlich, Matthias, editor, Ivanović, Mirjana, editor, and Hartig, Olaf, editor
Published: 2021
Full Text: View/download PDF

16. Approximate Nearest Neighbor Search Using Query-Directed Dense Graph

Author: Wang, Hongya, Zhao, Zeng, Yang, Kaixiang, Song, Hui, Xiao, Yingyuan, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Jensen, Christian S., editor, Lim, Ee-Peng, editor, Yang, De-Nian, editor, Chang, Chia-Hui, editor, Xu, Jianliang, editor, Peng, Wen-Chih, editor, Huang, Jen-Wei, editor, and Shen, Chih-Ya, editor
Published: 2021
Full Text: View/download PDF

17. Nearest neighbor search on embeddings rapidly identifies distant protein relations

Author: Konstantin Schütze, Michael Heinzinger, Martin Steinegger, and Burkhard Rost
Subjects: homology search, protein embeddings, language models, nearest neighbor search, remote homology detection, Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Since 1992, all state-of-the-art methods for fast and sensitive identification of evolutionary, structural, and functional relations between proteins (also referred to as “homology detection”) use sequences and sequence-profiles (PSSMs). Protein Language Models (pLMs) generalize sequences, possibly capturing the same constraints as PSSMs, e.g., through embeddings. Here, we explored how to use such embeddings for nearest neighbor searches to identify relations between protein pairs with diverged sequences (remote homology detection for levels of
Published: 2022
Full Text: View/download PDF

18. A Data-Integration Analysis on Road Emissions and Traffic Patterns

Author: Qu, Ao, Wang, Yu, Hu, Yue, Wang, Yanbing, Baroud, Hiba, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Nichols, Jeffrey, editor, Verastegui, Becky, editor, Maccabe, Arthur ‘Barney’, editor, Hernandez, Oscar, editor, Parete-Koon, Suzanne, editor, and Ahearn, Theresa, editor
Published: 2020
Full Text: View/download PDF

19. Confirmation Sampling for Exact Nearest Neighbor Search

Author: Christiani, Tobias, Pagh, Rasmus, Thorup, Mikkel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Satoh, Shin'ichi, editor, Vadicamo, Lucia, editor, Zimek, Arthur, editor, Carrara, Fabio, editor, Bartolini, Ilaria, editor, Aumüller, Martin, editor, Jónsson, Björn Þór, editor, and Pagh, Rasmus, editor
Published: 2020
Full Text: View/download PDF

20. Towards Proximity Graph Auto-configuration: An Approach Based on Meta-learning

Author: Oyamada, Rafael Seidi, Shimomura, Larissa C., Junior, Sylvio Barbon, Kaster, Daniel S., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Darmont, Jérôme, editor, Novikov, Boris, editor, and Wrembel, Robert, editor
Published: 2020
Full Text: View/download PDF

21. Probabilistic analysis of vantage point trees

Author: Vladyslav Bohun
Subjects: Fixed-point equation, machine learning, Markov chain, nearest neighbor search, probabilistic analysis, random tree, Applied mathematics. Quantitative methods, T57-57.97, Mathematics, QA1-939
Abstract: Probabilistic properties of vantage point trees are studied. A vp-tree built from a sequence of independent identically distributed points in ${[-1,\hspace{0.1667em}1]^{d}}$ with the ${\ell _{\infty }}$-distance function is considered. The length of the leftmost path in the tree, as well as partitions over the space it produces are analyzed. The results include several convergence theorems regarding these characteristics, as the number of nodes in the tree tends to infinity.
Published: 2021
Full Text: View/download PDF

22. PLDH: Pseudo-Labels Based Deep Hashing

Author: Huawen Liu, Minhao Yin, Zongda Wu, Liping Zhao, Qi Li, Xinzhong Zhu, and Zhonglong Zheng
Subjects: learning to hash, image retrieval, deep learning, nearest neighbor search, unsupervised learning, pseudo-label, Mathematics, QA1-939
Abstract: Deep hashing has received a great deal of attraction in large-scale data analysis, due to its high efficiency and effectiveness. The performance of deep hashing models heavily relies on label information, which is very expensive to obtain. In this work, a novel end-to-end deep hashing model based on pseudo-labels for large-scale data without labels is proposed. The proposed hashing model consists of two major stages, where the first stage aims to obtain pseudo-labels based on deep features extracted by a pre-training deep convolution neural network. The second stage generates hash codes with high quality by the same neural network in the previous stage, coupled with an end-to-end hash layer, whose purpose is to encode data into a binary representation. Additionally, a quantization loss is introduced and interwound within these two stages. Evaluation experiments were conducted on two frequently-used image collections, CIFAR-10 and NUS-WIDE, with eight popular shallow and deep hashing models. The experimental results show the superiority of the proposed method in image retrieval.
Published: 2023
Full Text: View/download PDF

23. Efficient Retrieval of Music Recordings Using Graph-Based Index Structures

Author: Frank Zalkow, Julian Brandner, and Meinard Müller
Subjects: indexing, music information retrieval, nearest neighbor search, efficiency, runtime, Applied mathematics. Quantitative methods, T57-57.97
Abstract: Flexible retrieval systems are required for conveniently browsing through large music collections. In a particular content-based music retrieval scenario, the user provides a query audio snippet, and the retrieval system returns music recordings from the collection that are similar to the query. In this scenario, a fast response from the system is essential for a positive user experience. For realizing low response times, one requires index structures that facilitate efficient search operations. One such index structure is the K-d tree, which has already been used in music retrieval systems. As an alternative, we propose to use a modern graph-based index, denoted as Hierarchical Navigable Small World (HNSW) graph. As our main contribution, we explore its potential in the context of a cross-version music retrieval application. In particular, we report on systematic experiments comparing graph- and tree-based index structures in terms of the retrieval quality, disk space requirements, and runtimes. Despite the fact that the HNSW index provides only an approximate solution to the nearest neighbor search problem, we demonstrate that it has almost no negative impact on the retrieval quality in our application. As our main result, we show that the HNSW-based retrieval is several orders of magnitude faster. Furthermore, the graph structure also works well with high-dimensional index items, unlike the tree-based structure. Given these merits, we highlight the practical relevance of the HNSW graph for music information retrieval (MIR) applications.
Published: 2021
Full Text: View/download PDF

24. Is it possible to find the single nearest neighbor of a query in high dimensions?

Author: Ting, Kai Ming, Washio, Takashi, Zhu, Ye, Xu, Yang, and Zhang, Kaifeng
Subjects: *PROBABILITY density function, *OPEN-ended questions, *CLASSIFICATION
Abstract: We investigate an open question in the study of the curse of dimensionality: Is it possible to find the single nearest neighbor of a query in high dimensions? Using the notion of (in)distinguishability to examine whether the feature map of a kernel is able to distinguish two distinct points in high dimensions, we analyze this ability of a metric-based Lipschitz continuous kernel as well as that of the recently introduced Isolation Kernel. Between the two kernels, we show that only Isolation Kernel has distinguishability and it performs consistently well in four tasks: indexed search for exact nearest neighbor search, anomaly detection using kernel density estimation, t-SNE visualization and SVM classification in both low and high dimensions, compared with distance, Gaussian and three other existing kernels. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. QuadScatter: Computational Efficiency in Simultaneous Transmissions for Large-Scale IoT Backscatter Networks

Author: Ousmane Zeba, Kentaro Hayashi, Kazuhiro Kizaki, Shunsuke Saruwatari, and Takashi Watanabe
Subjects: Computational cost, simultaneous backscatter transmissions, quadtree, quadtree quadrant, nearest neighbor search, exhaustive search, Electronic computers. Computer science, QA75.5-76.95, Information technology, T58.5-58.64
Abstract: Backscatter communication has attracted attention owing to its ultra-low-power consumption ability, which is expected to enhance internet of things (IoT) technology that aims to enable many novel applications for object-to-object communication. Such a network with a large and continuously increasing number of connected objects will benefit significantly from resource-saving. This work introduces a system named QuadScatter, which is a set of algorithms that select and associate transmitters, tags and readers to enable simultaneous backscatter transmissions and increase network capacity. Consequently, the energy consumption in the network is considerably lessened. Intensive simulations have been conducted to demonstrate the effectiveness of backscatter simultaneous transmissions. QuadScatter shows promising results compared to the exhaustive search algorithm. The simulation results highlight computational time and simultaneous transmission improvements of at least $ 250\rm \times$ and $ 2\rm \times$, respectively. Furthermore, while the exhaustive search is limited to a few nodes ($ < 20$), our proposal uses numerous nodes. Additionally, an implementation of limited simultaneous backscatter transmissions is conducted to show its feasibility in the real world.
Published: 2021
Full Text: View/download PDF

26. Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB

Author: Mateusz Banach
Subjects: bioinformatics, biological assembly, bounding ellipsoid, globularity, kernel density, nearest neighbor search, Microbiology, QR1-502
Abstract: In this paper, we present an update to the ellipsoid profile algorithm (EP), a simple technique for the measurement of the globularity of protein structures without the calculation of molecular surfaces. The globularity property is understood in this context as the ability of the molecule to fill a minimum volume enclosing ellipsoid (MVEE) that approximates its assumed globular shape. The more of the interior of this ellipsoid is occupied by the atoms of the protein, the better are its globularity metrics. These metrics are derived from the comparison of the volume of the voxelized representation of the atoms and the volume of all voxels that can fit inside that ellipsoid (a uniform unit Å cube lattice). The so-called ellipsoid profile shows how the globularity changes with the distance from the center. Two of its values, the so-called ellipsoid indexes, are used to classify the structure as globular, semi-globular or non-globular. Here, we enhance the workflow of the EP algorithm via an improved outlier detection subroutine based on principal component analysis. It is capable of robust distinguishing between the dense parts of the molecules and, for example, disordered chain fragments fully exposed to the solvent. The PCA-based method replaces the current approach based on kernel density estimation. The improved EP algorithm was tested on 2124 representatives of domain superfamilies from SCOP 2.08. The second part of this work is dedicated to the survey of globularity of 3594 representatives of biological assemblies from molecules currently deposited in the PDB and analyzed by the 3DComplex database (monomers and complexes up to 60 chains).
Published: 2023
Full Text: View/download PDF

27. Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data.

Author: López-Sánchez, Daniel, de Bodt, Cyril, Lee, John A., Arrieta, Angélica González, and Corchado, Juan M.
Subjects: RANDOM matrices, DIMENSIONAL reduction algorithms, SEARCH algorithms
Abstract: Random Projection is one of the most popular and successful dimensionality reduction algorithms for large volumes of data. However, given its stochastic nature, different initializations of the projection matrix can lead to very different levels of performance. This paper presents a guided random search algorithm to mitigate this problem. The proposed method uses a small number of training data samples to iteratively adjust a projection matrix, improving its performance on similarly distributed data. Experimental results show that projection matrices generated with the proposed method result in a better preservation of distances between data samples. Conveniently, this is achieved while preserving the database-friendliness of the projection matrix, as it remains sparse and comprised exclusively of integers after being tuned with our algorithm. Moreover, running the proposed algorithm on a consumer-grade CPU requires only a few seconds. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Collaborative filtering recommendation based on local sensitive hash in blockchain environment.

Author: WANG Jing and QIAN Xiao-dong
Abstract: To solve the issue of low recommendation performance caused by massive high-dimensional data in the blockchain environment, the local sensitive hash algorithm is optimized to reduce the calculation and storage overhead in the nearest neighbor search process. The principal component of the data distribution is used to reduce the poorly captured projection direction in the traditional LSH. Meanwhile, the projection vector weight is quantified, the interval of the hash bucket is adjusted, and the query result set is further refined according to the number of conflicts. Finally, a weighted average strategy is used to predict the score and generate a recommendation list. Experiments show that, compared with other algorithm indexes, the optimized LSH only needs a small amount of hash tables and hash functions to obtain accurate neighbor search results, and the search efficiency is greatly improved. The optimized LSH can well adapt to the characteristics of blockchain data, alleviate the impact of high-dimensional large-scale data on recommendation performance, and improve the recommendation quality and efficiency. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

29. Delaunay walk for fast nearest neighbor: accelerating correspondence matching for ICP.

Author: Anderson, James D., Raettig, Ryan M., Larson, Josh, Nykl, Scott L., Taylor, Clark N., and Wischgoll, Thomas
Abstract: Point set registration algorithms such as Iterative Closest Point (ICP) are commonly utilized in time-constrained environments like robotics. Finding the nearest neighbor of a point in a reference 3D point set is a common operation in ICP and frequently consumes at least 90% of the computation time. We introduce a novel approach to performing the distance-based nearest neighbor step based on Delaunay triangulation. This greedy algorithm finds the nearest neighbor of a query point by traversing the edges of the Delaunay triangulation created from a reference 3D point set. Our work integrates the Delaunay traversal into the correspondences search of ICP and exploits the iterative aspect of ICP by caching previous correspondences to expedite each iteration. An algorithmic analysis and comparison is conducted showing an order of magnitude speedup for both serial and vector processor implementation. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

30. In-Memory Nearest Neighbor Search With Nanoelectromechanical Ternary Content-Addressable Memory.

Author: Lee, Jae Seong, Yoon, Jisoo, and Choi, Woo Young
Subjects: ASSOCIATIVE storage, HAMMING distance, HYBRID integrated circuits, ARTIFICIAL neural networks
Abstract: Nearest neighbor (NN) search is widely used in pattern classification and memory-augmented neural networks. To overcome the von Neumann bottleneck in conventional NN search architecture, in this study, nanoelectromechanical-switch-based ternary content-addressable memory (NEMTCAM) is introduced for the NN classifier. NEMTCAM can calculate the Hamming distance between the input vector and the stored vectors in a parallel search operation. The NEMTCAM operation was experimentally demonstrated. Furthermore, an analytical model for NN search accuracy, including cell-to-cell parasitic resistance, is presented. NEMTCAM can calculate up to 10 Hamming distances in 32-bit words owing to the high current ratio of the NEM memory switches. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. A Fast PQ Hash Code Indexing

Author: Shan, Jingsong, Zhang, Yongjun, Jiang, Mingxin, Jin, Chunhua, Zhang, Zhengwei, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Barolli, Leonard, editor, Xhafa, Fatos, editor, Javaid, Nadeem, editor, and Enokido, Tomoya, editor
Published: 2019
Full Text: View/download PDF

32. Fast Filtering for Nearest Neighbor Search by Sketch Enumeration Without Using Matching

Author: Higuchi, Naoya, Imamura, Yasunobu, Kuboyama, Tetsuji, Hirata, Kouichi, Shinohara, Takeshi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Jixue, editor, and Bailey, James, editor
Published: 2019
Full Text: View/download PDF

33. Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines

Author: Mu, Cun (Matthew), Zhao, Jun (Raymond), Yang, Guang, Yang, Binwei, Yan, Zheng (John), Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amato, Giuseppe, editor, Gennaro, Claudio, editor, Oria, Vincent, editor, and Radovanović, Miloš, editor
Published: 2019
Full Text: View/download PDF

34. Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Author: Jääsaari, Elias, Hyvönen, Ville, Roos, Teemu, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Yang, Qiang, editor, Zhou, Zhi-Hua, editor, Gong, Zhiguo, editor, Zhang, Min-Ling, editor, and Huang, Sheng-Jun, editor
Published: 2019
Full Text: View/download PDF

35. Component Placement Process Optimization for Multi-head Surface Mounting Machine Using a Hybrid Algorithm.

Author: Cheng-Jian Lin and Chun-Hui Lin
Subjects: ASSEMBLY line balancing, PROCESS optimization, MIXED integer linear programming, QUADRATIC assignment problem
Abstract: The article addresses the optimization problem of printed circuit board (PCB) assembly scheduling for surface mount placement (SMP) machines with multiple heads. Topics include studies have oversimplified the practical problems and limited their focus to three sub-problems; and propose a hybrid algorithm to consider the component height, the pick-and-place restrictions, and simultaneous pickup restrictions.
Published: 2021
Full Text: View/download PDF

36. Toward more efficient locality‐sensitive hashing via constructing novel hash function cluster.

Author: Zhang, Shi, Huang, Jin, Xiao, Ruliang, Du, Xin, Gong, Ping, and Lin, Xinhong
Subjects: MAXIMUM entropy method, VECTOR quantization, DISTRIBUTION (Probability theory), ALGORITHMS
Abstract: Locality‐sensitive hashing (LSH) is widely used in the context of nearest neighbor search of large‐scale high‐dimensions. However, there are serious imbalance problems between the efficiency of data index structure construction and the query accuracy of LSH methods. In this article, a novel higher‐entropy‐hyperplane clusters LSH (HEHC‐LSH) algorithm is proposed, which we improve vector quantization to preprocess the data and greatly shortens the preprocessing time; We innovatively integrate the maximum entropy principle into the distribution estimation algorithm to construct a novel hash function cluster method, also incorporate bootstrap aggregating of ensemble learning, and adopt the parallel index dictionary to improve the generalization performance of the index structure. And in the query stage, we realize the comprehensive filtering of index set using integrated learning idea, which not only avoids a lot of distance calculation, but also improves the quality of query results. We also analyze the rationality and effectiveness of the proposed method. Finally, extensive experiment results show that HEHC‐LSH can achieve more higher precision and efficiency simultaneously comparing to current methods, and reflect the strong robustness on different datasets. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

37. Angular Deep Supervised Vector Quantization for Image Retrieval.

Author: Zhou, Chang, Po, Lai Man, and Ou, Weifeng
Subjects: *VECTOR quantization, *IMAGE retrieval, *CENTROID, *PROBLEM solving
Abstract: Most of the deep quantization methods adopt unsupervised approaches, and the quantization process usually occurs in the Euclidean space on top of the deep feature and its approximate value. When this approach is applied to the retrieval tasks, since the internal product space of the retrieval process is different from the Euclidean space of quantization, minimizing the quantization error (QE) does not necessarily lead to a good performance on the maximum inner product search (MIPS). To solve these problems, we treat Softmax classification as vector quantization (VQ) with angular decision boundaries and propose angular deep supervised VQ (ADSVQ) for image retrieval. Our approach can simultaneously learn the discriminative feature representation and the updatable codebook, both lying on a hypersphere. To reduce the QE between centroids and deep features, two regularization terms are proposed as supervision signals to encourage the intra-class compactness and inter-class balance, respectively. ADSVQ explicitly reformulates the asymmetric distance computation in MIPS to transform the image retrieval process into a two-stage classification process. Moreover, we discuss the extension of multiple-label cases from the perspective of quantization with binary classification. Extensive experiments demonstrate that the proposed ADSVQ has excellent performance on four well-known image data sets when compared with the state-of-the-art hashing methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

38. Contribution to the Understanding of Protein–Protein Interface and Ligand Binding Site Based on Hydrophobicity Distribution—Application to Ferredoxin I and II Cases.

Author: Banach, Mateusz, Chomilier, Jacques, and Roterman, Irena
Subjects: LIGAND binding (Biochemistry), BINDING sites, PROTEIN-ligand interactions, PROTEIN structure prediction, PROTEIN-protein interactions, PETROLEUM sales & prices
Abstract: Featured Application: hydrophobicity characteristic of proteins in the sense of fuzzy oil drop model may help predict residues engaged in protein–protein and protein–ligand interaction. Ferredoxin I and II are proteins carrying a specific ligand—an iron-sulfur cluster—which allows transport of electrons. These two classes of ferredoxin in their monomeric and dimeric forms are the object of this work. Characteristic of hydrophobic core in both molecules is analyzed via fuzzy oil drop model (FOD) to show the specificity of their structure enabling the binding of a relatively large ligand and formation of the complex. Structures of FdI and FdII are a promising example for the discussion of influence of hydrophobicity on biological activity but also for an explanation how FOD model can be used as an initial stage adviser (or a scoring function) in the search for locations of ligand binding pockets and protein–protein interaction areas. It is shown that observation of peculiarities in the hydrophobicity distribution present in the molecule (in this case—of a ferredoxin) may provide a promising starting location for computer simulations aimed at the prediction of quaternary structure of proteins. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

39. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering analysis.

Author: Qaddoura, Raneem, Faris, Hossam, and Aljarah, Ibrahim
Abstract: Evolutionary algorithms have shown their powerful capabilities in different machine learning problems including clustering which is a growing area of research nowadays. In this paper, we propose an efficient clustering technique based on the evolution behavior of genetic algorithm and an advanced variant of nearest neighbor search technique based on assignment and election mechanisms. The goal of the proposed algorithm is to improve the quality of clustering results by finding a solution that maximizes the separation between different clusters and maximizes the cohesion between data points in the same cluster. Our proposed algorithm which we refer to as "EvoNP" is tested with 15 well-known data sets using 5 well-known external evaluation measures and is compared with 7 well-regarded clustering algorithms. The experiments are conducted in two phases: evaluation of the best fitness function for the algorithm and evaluation of the algorithm against other clustering algorithms. The results show that the proposed algorithm works well with silhouette coefficient fitness function and outperforms the other algorithms for the majority of the data sets. The source code of EvoNP is available at http://evo-ml.com/evonp/. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

40. Damage detection of 3D structures using nearest neighbor search method.

Author: Abasi, Ali, Harsij, Vahid, and Soraghi, Ahmad
Subjects: *NEAREST neighbor analysis (Statistics), *ARTIFICIAL neural networks, *RANDOM noise theory, *PRINCIPAL components analysis, *WHITE noise
Abstract: An innovative damage identification method using the nearest neighbor search method to assess 3D structures is presented. The frequency response function was employed as the input parameters to detect the severity and place of damage in 3D spaces since it includes the most dynamic characteristics of the structures. Two-dimensional principal component analysis was utilized to reduce the size of the frequency response function data. The nearest neighbor search method was employed to detect the severity and location of damage in different damage scenarios. The accuracy of the approach was verified using measured data from an experimental test; moreover, two asymmetric 3D numerical examples were considered as the numerical study. The superiority of the method was demonstrated through comparison with the results of damage identification by using artificial neural network. Different levels of white Gaussian noise were used for polluting the frequency response function data to investigate the robustness of the methods against noise-polluted data. The results indicate that both methods can efficiently detect the damage properties including its severity and location with high accuracy in the absence of noise, but the nearest neighbor search method is more robust against noisy data than the artificial neural network. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

41. Fast Nearest Neighbor Search Based on Approximate k-NN Graph

Author: Yang, Jie, Zhao, Wan-Lei, Deng, Cheng-Hao, Wang, Hanzi, Moon, Sangwhan, Barbosa, Simone Diniz Junqueira, Series Editor, Chen, Phoebe, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Huet, Benoit, editor, Nie, Liqiang, editor, and Hong, Richang, editor
Published: 2018
Full Text: View/download PDF

42. Efficient and Secure Nearest Neighbor Search Over Encrypted Data in Cloud Environment

Author: Upinder, Kaur, Suri Pushpa, R., Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Chen, Phoebe, Founding Editor, Sivalingam, Krishna M., Founding Editor, Washio, Takashi, Founding Editor, Yuan, Junsong, Founding Editor, Panda, Brajendra, editor, Sharma, Sudeep, editor, and Roy, Nihar Ranjan, editor
Published: 2018
Full Text: View/download PDF

43. Nearest Neighbor Search Techniques Applied in the Nearest Feature Line Classifier

Author: Guo, Fang, Pan, Jeng-Shyang, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Krömer, Pavel, editor, Alba, Enrique, editor, Pan, Jeng-Shyang, editor, and Snášel, Václav, editor
Published: 2018
Full Text: View/download PDF

44. An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing

Author: Itotani, Yuri, Wakabayashi, Shin’ichi, Nagayama, Shinobu, Inagi, Masato, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Hartmann, Sven, editor, Ma, Hui, editor, Hameurlain, Abdelkader, editor, Pernul, Günther, editor, and Wagner, Roland R., editor
Published: 2018
Full Text: View/download PDF

45. A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm for High-Dimensional Data

Author: Wang, Xia Li, Wang, Xiaochun, Li, Xiaqiong, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, and Perner, Petra, editor
Published: 2018
Full Text: View/download PDF

46. An Efficient Exact Nearest Neighbor Search by Compounded Embedding

Author: Li, Mingjie, Zhang, Ying, Sun, Yifang, Wang, Wei, Tsang, Ivor W., Lin, Xuemin, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pei, Jian, editor, Manolopoulos, Yannis, editor, Sadiq, Shazia, editor, and Li, Jianxin, editor
Published: 2018
Full Text: View/download PDF

47. Learning to Index in Large-Scale Datasets

Author: Prayoonwong, Amorntip, Wang, Cheng-Hsien, Chiu, Chih-Yi, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Schoeffmann, Klaus, editor, Chalidabhongse, Thanarat H., editor, Ngo, Chong Wah, editor, Aramvith, Supavadee, editor, O’Connor, Noel E., editor, Ho, Yo-Sung, editor, Gabbouj, Moncef, editor, and Elgammal, Ahmed, editor
Published: 2018
Full Text: View/download PDF

48. A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs.

Author: Tellez, Eric S., Ruiz, Guillermo, Chavez, Edgar, and Graff, Mario
Subjects: *K-nearest neighbor classification, *PROBLEM solving, *METAHEURISTIC algorithms, *COMBINATORIAL optimization
Abstract: Nearest neighbor search is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query q by minimizing a kernel function defined by the distance from q to each object in the database. The minimization is performed using metaheuristics to solve the problem rapidly; even when some methods in the literature use this strategy behind the scenes, our approach is the first one using it explicitly. We also provide two approaches to select edges in the graph's construction stage that limit memory footprint and reduce the number of free parameters simultaneously. We carry out a thorough experimental comparison with state-of-the-art indexes through synthetic and real-world datasets; we found out that our contributions achieve competitive performances regarding speed, accuracy, and memory in almost any of our benchmarks. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

49. Data efficient indoor thermal comfort prediction using instance based transfer learning method.

Author: Li, Kangji, Liu, Yufei, Chen, Lei, and Xue, Wenping
Abstract: The precise prediction of indoor thermal sensation is critical to building energy efficiency and satisfaction of occupants. In this field, data-driven models have shown superior performance than Predicted Mean Vote (PMV) model. However, a reliable and accurate model of this kind relies on sufficient and high-quality data from physiological and environmental sensing, which in practice is difficult or expensive to collect. Rather than developing a separated predictive model for certain task, transfer learning method could intelligently use knowledge from similar domains to solve the target problem, and thus the requirement of data resources is reduced. In the last few years, transfer learning has been introduced into the field of thermal comfort prediction. Most of them used the source domain data as a whole for model pre-training, and the samples' suitability judgment was not fully considered. But the general situation in this field is: there are few public and available data resources and their data distributions are hardly consistent with that of specific target task. Aiming at the issue, this study proposes an instance based transfer learning model, which is based on data weighting and reusing for data efficient thermal comfort prediction. We assume that samples from source and target domains obey different distributions but have some common features that can be transferred across domains. The Nearest Neighbor Search (NNS) is used for similarity judgement and selection of source domain data; A Boosting type transfer learning algorithm, i.e., TrAdaBoost, is improved for reliable thermal comfort prediction. In the algorithm, an automatic weighting mechanism could optimally adjust the source training data's weights for performance improvement. By using field experimental data set and a public data set as target and source domains data respectively, the performance of the proposed model is investigated. Results show when target domain data is insufficient, the proposed NNS-iTrAdaBoost method has superior transfer learning ability and could achieve much better performance than traditional data-driven models. In practice, the proposed method is easy for engineering implementation and has great potential for thermal comfort prediction based on small-scale experiments. • An improved transfer learning algorithm, called iTrAdaBoost, is designed for indoor thermal comfort prediction. • The Nearest Neighbor Search method is used for similarity judgement and selection of source domain data. • A field thermal comfort experiment in an office room is conducted for target domain data collection. • The performance of proposed method is investigated and compared with other data-driven models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. SWINN: Efficient nearest neighbor search in sliding windows using graphs.

Author: Mastelini, Saulo Martiello, Veloso, Bruno, Halford, Max, de Carvalho, André Carlos Ponce de Leon Ferreira, and Gama, João
Subjects: *MACHINE learning, *SEARCH algorithms, *CLASSIFICATION algorithms, *ONLINE algorithms, *ONLINE education, *ELECTRONIC information resource searching
Abstract: Nearest neighbor search (NNS) is one of the main concerns in data stream applications since similarity queries can be used in multiple scenarios. Online NNS is usually performed on a sliding window by lazily scanning every element currently stored in the window. This paper proposes Sliding Window-based Incremental Nearest Neighbors (SWINN), a graph-based online search index algorithm for speeding up NNS in potentially never-ending and dynamic data stream tasks. Our proposal broadens the application of online NNS-based solutions, as even moderately large data buffers become impractical to handle when a naive NNS strategy is selected. SWINN enables efficient handling of large data buffers by using an incremental strategy to build and update a search graph supporting any distance metric. Vertices can be added and removed from the search graph. To keep the graph reliable for search queries, lightweight graph maintenance routines are run. According to experimental results, SWINN is significantly faster than performing a naive complete scan of the data buffer while keeping competitive search recall values. We also apply SWINN to online classification and regression tasks and show that our proposal is effective against popular online machine learning algorithms. • We introduce SWINN, a graph-based nearest neighbor search index for sliding windows. • SWINN works with any distance metric and is robust against concept drift. • SWINN is significantly faster than a linear scan of the data buffer. • Our proposal delivers competitive search recalls compared to the traditional strategy. • SWINN is successfully applied to classification and regression case studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

5,426 results on '"Nearest neighbor search"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources