5,426 results on '"Nearest neighbor search"'
Search Results
2. Revisiting Nearest-Neighbor-Based Information Set Decoding
- Author
-
Esser, Andre, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Quaglia, Elizabeth A., editor
- Published
- 2024
- Full Text
- View/download PDF
3. Storage and Management of Ship Position Based on Geographic Grid Coding and Its Efficiency Analysis in Neighborhood Search—A Case Study of Shipwreck Rescue and Google S2.
- Author
-
Jiang, Bohui, Zhou, Weifeng, and Han, Haibin
- Subjects
COLLISIONS at sea ,SHIPWRECKS ,NEIGHBORHOODS ,EUCLIDEAN distance ,SHIPS ,RESCUE work ,FISHERIES - Abstract
The marine fishery is a high-risk industry, and when a fishing vessel is in distress at sea, it is essential to protect the lives of the fishermen by implementing a fast and effective rescue for the vessel in distress. In this process, quickly retrieving the closest neighboring vessel in distress based on an effective rescue time is critical to the vessel's rescue. In order to improve the time efficiency of retrieving the nearest available rescue vessels in the current sea area, this paper proposes a method based on the Google S2 grid coding for ships in distress at sea. The method is divided into three parts: (1) encoding the ship's position based on the Google S2 algorithm, (2) retrieving the set of available ships in the current sea area based on the effective rescue distance and its corresponding coding level, and (3) sorting the set of available ships in the current sea area according to the proximity to the ship in distress by using the method of "Alternating Grid Sorting Based on Neighborhoods and Different Coding Levels". The effective rescue distance is set by the rescue time, the type of rescue vessel, and the speed. This paper sets the simulation experiment area as the East China Sea area. Different magnitudes of ship position datasets (1 × 10
2 , 1 × 103 , 1 × 104 , 1 × 105 , 1 × 106 ) are generated by simulating the scenarios where a ship's location is reported or collected by AIS or VMS. The temporal retrieval efficiencies of querying based on the two methods, the Euclidean distance and the Google S2 grid encoding, are compared and analyzed. The experimental results show that the total time consumed by the query method based on the Google S2 grid encoding and the query method based on Euclidean distance is reduced by 37.06%, 29.83%, 72.75%, 94.43%, and 94.53%, respectively, in the process of generating the set of rescuable ships retrieved, based on the effective rescue distance. Therefore, the time retrieval efficiency of the maritime vessel search and rescue method based on the Google S2 coding is high, which can effectively improve the query efficiency of rescue vessels in the neighborhood of distressed vessels. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
4. 基于容忍因子的近似最近邻 混合查询算法.
- Author
-
贺广福, 薛源海, 陈翠婷, 俞晓明, 刘欣然, and 程学旗
- Abstract
Approximate nearest neighbor search (ANNS) is an important technique in the field of computer science for efficient similarity search, enabling fast information retrieval in large-scale datasets. With the increasing demand for high-precision information retrieval, there is a growing use of the hybrid query method that utilizes both structured and unstructured information for querying, which has broad application prospects. However, filtered greedy algorithms based on nearest neighbor graphs may reduce the connectivity of the graph due to the impact of structural constraints in hybrid queries, ultimately damaging search accuracy. This article proposes a tolerance factor based filtered greedy algorithm, which controls the participation of vertices that do not meet structural constraints in routing through a tolerance factor, maintaining the connectivity of the original nearest neighbor graph without changing the index structure, and overcoming the negative impact of structural constraints on retrieval accuracy. The experimental results demonstrate that the new method can achieve high-precision search for ANNS under different levels of structural constraints, while maintaining retrieval efficiency. This study solves the problem of ANNS based on nearest neighbor graphs in hybrid query scenarios, providing an effective solution for fast hybrid query information retrieval in large-scale datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. LFPeers: Temporal similarity search and result exploration.
- Author
-
Sachdeva, Madhav, Burmeister, Jan, Kohlhammer, Jörn, and Bernard, Jürgen
- Subjects
- *
CONCEPT learning , *VISUAL analytics , *BEHAVIORAL assessment , *COVID-19 pandemic , *PEERS , *NATURAL gas prospecting - Abstract
In this paper, we introduce a general concept for the analysis of temporal and multivariate data and the system LFPeers that applies this concept to temporal similarity search and results exploration. The conceptual workflow divides the analysis in two phases: a search phase to find the most similar objects to a query object before a time point t 0 in the temporal data, and an exploration phase to analyze and contextualize this subset of objects after t 0. LFPeers enables users to search for peers through interactive similarity search and filtering, explore interesting behavior of this peer group, and learn from peers through the assessment of diverging behaviors. We present the conceptual workflow to learn from peers and the LFPeers system with novel interfaces for search and exploration in temporal and multivariate data. An earlier workshop publication for LFPeers included a usage scenario targeting epidemiologists and the public who want to learn from the Covid-19 pandemic and distinguish successful and ineffective measures. In this extended paper, we now show how our concept is generalized and applied by domain experts in two case studies, including a novel case on stocks data. Finally, we reflect on the new state of development and on the insights gained by the experts in the case studies on the search and exploration of temporal data to learn from peers. [Display omitted] • Conceptual framework for temporal and multimodal similarity search and exploration. • Visual Analytics system for cause–effect analysis on temporal data. • Interactive user-defined search and exploration of data objects by similarity. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. 基于优先级队列的隧道无序点云快速法线全局定向方法.
- Author
-
李 纯, 黄新文, 周清华, 薛宇腾, 杨璟林, and 宋 浩
- Abstract
Copyright of Railway Standard Design is the property of Railway Standard Design Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
7. LayerLSB:Nearest Neighbors Search Based on Layered Locality Sensitive B-tree
- Author
-
DING Jiwen, LIU Zhuojin, WANG Jiaxing, ZHANG Yanfeng, YU Ge
- Subjects
nearest neighbor search ,layered structure ,locality sensitive hashing ,locality sensitive b-tree ,Computer software ,QA76.75-76.765 ,Technology (General) ,T1-995 - Abstract
Nearest neighbor search has become a significant research problem due to its wide applications.Traditional spatial index structures such as R-tree and KD-tree can efficiently return accurate nearest neighbor search results in low-dimensional space,but they are not suitable for high-dimensional space.Locality sensitive B-tree(LSB) hashes data points to the sortable one-dimension values and arranges them in a tree-like structure,which dramatically improves the space and query efficiency of the previous locality sensitive hashing(LSH) implementations,without compromising the resulting quality.However,LSB fails to take data distribution into account.It performs well in a uniform data distribution setting,but exhibits unstable performance when the data are skewed.In response to this problem,this paper proposes LayerLSB,which reconstructs the hash values in a dense range by exploring the density of the hash values to make the distribution more uniform,so as to improve the query efficiency.Compared to LSB,LayerLSB indices become more targeted in terms of data distribution,and a multi-layered structure is constructed.Compared with the simple rehashing method,the multi-layered approach will still guarantee the search quality by carefully choosing the number of groups and hash functions.The results show that the query cost can be reduced to 44.6% of the original at most when achieving the same query accuracy.
- Published
- 2023
- Full Text
- View/download PDF
8. Storage and Management of Ship Position Based on Geographic Grid Coding and Its Efficiency Analysis in Neighborhood Search—A Case Study of Shipwreck Rescue and Google S2
- Author
-
Bohui Jiang, Weifeng Zhou, and Haibin Han
- Subjects
ship rescue ,effective rescue time ,gridding ,ship position code ,neighborhood search ,nearest neighbor search ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The marine fishery is a high-risk industry, and when a fishing vessel is in distress at sea, it is essential to protect the lives of the fishermen by implementing a fast and effective rescue for the vessel in distress. In this process, quickly retrieving the closest neighboring vessel in distress based on an effective rescue time is critical to the vessel’s rescue. In order to improve the time efficiency of retrieving the nearest available rescue vessels in the current sea area, this paper proposes a method based on the Google S2 grid coding for ships in distress at sea. The method is divided into three parts: (1) encoding the ship’s position based on the Google S2 algorithm, (2) retrieving the set of available ships in the current sea area based on the effective rescue distance and its corresponding coding level, and (3) sorting the set of available ships in the current sea area according to the proximity to the ship in distress by using the method of “Alternating Grid Sorting Based on Neighborhoods and Different Coding Levels”. The effective rescue distance is set by the rescue time, the type of rescue vessel, and the speed. This paper sets the simulation experiment area as the East China Sea area. Different magnitudes of ship position datasets (1 × 102, 1 × 103, 1 × 104, 1 × 105, 1 × 106) are generated by simulating the scenarios where a ship’s location is reported or collected by AIS or VMS. The temporal retrieval efficiencies of querying based on the two methods, the Euclidean distance and the Google S2 grid encoding, are compared and analyzed. The experimental results show that the total time consumed by the query method based on the Google S2 grid encoding and the query method based on Euclidean distance is reduced by 37.06%, 29.83%, 72.75%, 94.43%, and 94.53%, respectively, in the process of generating the set of rescuable ships retrieved, based on the effective rescue distance. Therefore, the time retrieval efficiency of the maritime vessel search and rescue method based on the Google S2 coding is high, which can effectively improve the query efficiency of rescue vessels in the neighborhood of distressed vessels.
- Published
- 2024
- Full Text
- View/download PDF
9. PLDH: Pseudo-Labels Based Deep Hashing.
- Author
-
Liu, Huawen, Yin, Minhao, Wu, Zongda, Zhao, Liping, Li, Qi, Zhu, Xinzhong, and Zheng, Zhonglong
- Subjects
- *
CONVOLUTIONAL neural networks , *DEEP learning , *IMAGE retrieval , *FEATURE selection - Abstract
Deep hashing has received a great deal of attraction in large-scale data analysis, due to its high efficiency and effectiveness. The performance of deep hashing models heavily relies on label information, which is very expensive to obtain. In this work, a novel end-to-end deep hashing model based on pseudo-labels for large-scale data without labels is proposed. The proposed hashing model consists of two major stages, where the first stage aims to obtain pseudo-labels based on deep features extracted by a pre-training deep convolution neural network. The second stage generates hash codes with high quality by the same neural network in the previous stage, coupled with an end-to-end hash layer, whose purpose is to encode data into a binary representation. Additionally, a quantization loss is introduced and interwound within these two stages. Evaluation experiments were conducted on two frequently-used image collections, CIFAR-10 and NUS-WIDE, with eight popular shallow and deep hashing models. The experimental results show the superiority of the proposed method in image retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic.
- Author
-
Filtser, Arnold, Filtser, Omrit, and Katz, Matthew J.
- Subjects
- *
DATA structures , *CURVES , *APPROXIMATE reasoning - Abstract
In the (1 + ε , r) -approximate near-neighbor problem for curves (ANNC) under some similarity measure δ , the goal is to construct a data structure for a given set C of curves that supports approximate near-neighbor queries: Given a query curve Q, if there exists a curve C ∈ C such that δ (Q , C) ≤ r , then return a curve C ′ ∈ C with δ (Q , C ′) ≤ (1 + ε) r . There exists an efficient reduction from the (1 + ε) -approximate nearest-neighbor problem to ANNC, where in the former problem the answer to a query is a curve C ∈ C with δ (Q , C) ≤ (1 + ε) · δ (Q , C ∗) , where C ∗ is the curve of C most similar to Q. Given a set C of n curves, each consisting of m points in d dimensions, we construct a data structure for ANNC that uses n · O (1 ε) md storage space and has O(md) query time (for a query curve of length m), where the similarity measure between two curves is their discrete Fréchet or dynamic time warping distance. Our method is simple to implement, deterministic, and results in an exponential improvement in both query time and storage space compared to all previous bounds. Further, we also consider the asymmetric version of ANNC, where the length of the query curves is k ≪ m , and obtain essentially the same storage and query bounds as above, except that m is replaced by k. Finally, we apply our method to a version of approximate range counting for curves and achieve similar bounds. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. DIOT: Detecting Implicit Obstacles from Trajectories
- Author
-
Lei, Yifan, Huang, Qiang, Kankanhalli, Mohan, Tung, Anthony, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhattacharya, Arnab, editor, Lee Mong Li, Janice, editor, Agrawal, Divyakant, editor, Reddy, P. Krishna, editor, Mohania, Mukesh, editor, Mondal, Anirban, editor, Goyal, Vikram, editor, and Uday Kiran, Rage, editor
- Published
- 2022
- Full Text
- View/download PDF
12. 基于车载LiDAR点云的路边地上物 多阶段聚类分割算法.
- Author
-
李康弘, 李永强, 李佳佳, 任京智, 郝道前, and 王治尧
- Subjects
POINT cloud ,STREET signs ,STREET lighting ,URBAN trees ,SEARCH algorithms - Abstract
Copyright of Geography & Geographic Information Science is the property of Geography & Geo-Information Science Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
13. Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB.
- Author
-
Banach, Mateusz
- Subjects
- *
PROTEIN structure , *PROBABILITY density function , *ELLIPSOIDS , *PRINCIPAL components analysis , *OUTLIER detection , *DATABASES - Abstract
In this paper, we present an update to the ellipsoid profile algorithm (EP), a simple technique for the measurement of the globularity of protein structures without the calculation of molecular surfaces. The globularity property is understood in this context as the ability of the molecule to fill a minimum volume enclosing ellipsoid (MVEE) that approximates its assumed globular shape. The more of the interior of this ellipsoid is occupied by the atoms of the protein, the better are its globularity metrics. These metrics are derived from the comparison of the volume of the voxelized representation of the atoms and the volume of all voxels that can fit inside that ellipsoid (a uniform unit Å cube lattice). The so-called ellipsoid profile shows how the globularity changes with the distance from the center. Two of its values, the so-called ellipsoid indexes, are used to classify the structure as globular, semi-globular or non-globular. Here, we enhance the workflow of the EP algorithm via an improved outlier detection subroutine based on principal component analysis. It is capable of robust distinguishing between the dense parts of the molecules and, for example, disordered chain fragments fully exposed to the solvent. The PCA-based method replaces the current approach based on kernel density estimation. The improved EP algorithm was tested on 2124 representatives of domain superfamilies from SCOP 2.08. The second part of this work is dedicated to the survey of globularity of 3594 representatives of biological assemblies from molecules currently deposited in the PDB and analyzed by the 3DComplex database (monomers and complexes up to 60 chains). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Locality sensitive hashing with bit selection.
- Author
-
Zhou, Wenhua, Liu, Huawen, Lou, Jungang, and Chen, Xin
- Subjects
BINARY codes ,IMAGE retrieval - Abstract
Locality sensitive hashing (LSH), one of the most popular hashing techniques, has attracted considerable attention for nearest neighbor search in the field of image retrieval. It can achieve promising performance only if the number of the generated hash bits is large enough. However, more hash bits assembled to the binary codes contain massive redundant information and require more time cost and storage spaces. To alleviate this limitation, we propose a novel bit selection framework to pick important bits out of the hash bits generated by hashing techniques. Within the bit selection framework, we further exploit eleven evaluation criteria to measure the importance and similarity of each bit generated by LSH, so that the bits with high importance and less similarity are selected to assemble new binary codes. To demonstrate the effectiveness of the proposed framework of bit selection, we evaluated the proposed framework with the evaluation criteria on five commonly used data sets. Experimental results show the proposed bit selection framework works effectively in different cases, and the performance of LSH has not been degraded significantly after redundant hash bits reduced by the evaluation criteria. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. The Descriptiveness of Feature Descriptors with Reduced Dimensionality
- Author
-
Varga, Dániel, Szalai-Gindl, János Márk, Laki, Sándor, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Bellatreche, Ladjel, editor, Dumas, Marlon, editor, Karras, Panagiotis, editor, Matulevičius, Raimundas, editor, Awad, Ahmed, editor, Weidlich, Matthias, editor, Ivanović, Mirjana, editor, and Hartig, Olaf, editor
- Published
- 2021
- Full Text
- View/download PDF
16. Approximate Nearest Neighbor Search Using Query-Directed Dense Graph
- Author
-
Wang, Hongya, Zhao, Zeng, Yang, Kaixiang, Song, Hui, Xiao, Yingyuan, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Jensen, Christian S., editor, Lim, Ee-Peng, editor, Yang, De-Nian, editor, Chang, Chia-Hui, editor, Xu, Jianliang, editor, Peng, Wen-Chih, editor, Huang, Jen-Wei, editor, and Shen, Chih-Ya, editor
- Published
- 2021
- Full Text
- View/download PDF
17. Nearest neighbor search on embeddings rapidly identifies distant protein relations
- Author
-
Konstantin Schütze, Michael Heinzinger, Martin Steinegger, and Burkhard Rost
- Subjects
homology search ,protein embeddings ,language models ,nearest neighbor search ,remote homology detection ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Since 1992, all state-of-the-art methods for fast and sensitive identification of evolutionary, structural, and functional relations between proteins (also referred to as “homology detection”) use sequences and sequence-profiles (PSSMs). Protein Language Models (pLMs) generalize sequences, possibly capturing the same constraints as PSSMs, e.g., through embeddings. Here, we explored how to use such embeddings for nearest neighbor searches to identify relations between protein pairs with diverged sequences (remote homology detection for levels of
- Published
- 2022
- Full Text
- View/download PDF
18. A Data-Integration Analysis on Road Emissions and Traffic Patterns
- Author
-
Qu, Ao, Wang, Yu, Hu, Yue, Wang, Yanbing, Baroud, Hiba, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Nichols, Jeffrey, editor, Verastegui, Becky, editor, Maccabe, Arthur ‘Barney’, editor, Hernandez, Oscar, editor, Parete-Koon, Suzanne, editor, and Ahearn, Theresa, editor
- Published
- 2020
- Full Text
- View/download PDF
19. Confirmation Sampling for Exact Nearest Neighbor Search
- Author
-
Christiani, Tobias, Pagh, Rasmus, Thorup, Mikkel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Satoh, Shin'ichi, editor, Vadicamo, Lucia, editor, Zimek, Arthur, editor, Carrara, Fabio, editor, Bartolini, Ilaria, editor, Aumüller, Martin, editor, Jónsson, Björn Þór, editor, and Pagh, Rasmus, editor
- Published
- 2020
- Full Text
- View/download PDF
20. Towards Proximity Graph Auto-configuration: An Approach Based on Meta-learning
- Author
-
Oyamada, Rafael Seidi, Shimomura, Larissa C., Junior, Sylvio Barbon, Kaster, Daniel S., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Darmont, Jérôme, editor, Novikov, Boris, editor, and Wrembel, Robert, editor
- Published
- 2020
- Full Text
- View/download PDF
21. Probabilistic analysis of vantage point trees
- Author
-
Vladyslav Bohun
- Subjects
Fixed-point equation ,machine learning ,Markov chain ,nearest neighbor search ,probabilistic analysis ,random tree ,Applied mathematics. Quantitative methods ,T57-57.97 ,Mathematics ,QA1-939 - Abstract
Probabilistic properties of vantage point trees are studied. A vp-tree built from a sequence of independent identically distributed points in ${[-1,\hspace{0.1667em}1]^{d}}$ with the ${\ell _{\infty }}$-distance function is considered. The length of the leftmost path in the tree, as well as partitions over the space it produces are analyzed. The results include several convergence theorems regarding these characteristics, as the number of nodes in the tree tends to infinity.
- Published
- 2021
- Full Text
- View/download PDF
22. PLDH: Pseudo-Labels Based Deep Hashing
- Author
-
Huawen Liu, Minhao Yin, Zongda Wu, Liping Zhao, Qi Li, Xinzhong Zhu, and Zhonglong Zheng
- Subjects
learning to hash ,image retrieval ,deep learning ,nearest neighbor search ,unsupervised learning ,pseudo-label ,Mathematics ,QA1-939 - Abstract
Deep hashing has received a great deal of attraction in large-scale data analysis, due to its high efficiency and effectiveness. The performance of deep hashing models heavily relies on label information, which is very expensive to obtain. In this work, a novel end-to-end deep hashing model based on pseudo-labels for large-scale data without labels is proposed. The proposed hashing model consists of two major stages, where the first stage aims to obtain pseudo-labels based on deep features extracted by a pre-training deep convolution neural network. The second stage generates hash codes with high quality by the same neural network in the previous stage, coupled with an end-to-end hash layer, whose purpose is to encode data into a binary representation. Additionally, a quantization loss is introduced and interwound within these two stages. Evaluation experiments were conducted on two frequently-used image collections, CIFAR-10 and NUS-WIDE, with eight popular shallow and deep hashing models. The experimental results show the superiority of the proposed method in image retrieval.
- Published
- 2023
- Full Text
- View/download PDF
23. Efficient Retrieval of Music Recordings Using Graph-Based Index Structures
- Author
-
Frank Zalkow, Julian Brandner, and Meinard Müller
- Subjects
indexing ,music information retrieval ,nearest neighbor search ,efficiency ,runtime ,Applied mathematics. Quantitative methods ,T57-57.97 - Abstract
Flexible retrieval systems are required for conveniently browsing through large music collections. In a particular content-based music retrieval scenario, the user provides a query audio snippet, and the retrieval system returns music recordings from the collection that are similar to the query. In this scenario, a fast response from the system is essential for a positive user experience. For realizing low response times, one requires index structures that facilitate efficient search operations. One such index structure is the K-d tree, which has already been used in music retrieval systems. As an alternative, we propose to use a modern graph-based index, denoted as Hierarchical Navigable Small World (HNSW) graph. As our main contribution, we explore its potential in the context of a cross-version music retrieval application. In particular, we report on systematic experiments comparing graph- and tree-based index structures in terms of the retrieval quality, disk space requirements, and runtimes. Despite the fact that the HNSW index provides only an approximate solution to the nearest neighbor search problem, we demonstrate that it has almost no negative impact on the retrieval quality in our application. As our main result, we show that the HNSW-based retrieval is several orders of magnitude faster. Furthermore, the graph structure also works well with high-dimensional index items, unlike the tree-based structure. Given these merits, we highlight the practical relevance of the HNSW graph for music information retrieval (MIR) applications.
- Published
- 2021
- Full Text
- View/download PDF
24. Is it possible to find the single nearest neighbor of a query in high dimensions?
- Author
-
Ting, Kai Ming, Washio, Takashi, Zhu, Ye, Xu, Yang, and Zhang, Kaifeng
- Subjects
- *
PROBABILITY density function , *OPEN-ended questions , *CLASSIFICATION - Abstract
We investigate an open question in the study of the curse of dimensionality: Is it possible to find the single nearest neighbor of a query in high dimensions? Using the notion of (in)distinguishability to examine whether the feature map of a kernel is able to distinguish two distinct points in high dimensions, we analyze this ability of a metric-based Lipschitz continuous kernel as well as that of the recently introduced Isolation Kernel. Between the two kernels, we show that only Isolation Kernel has distinguishability and it performs consistently well in four tasks: indexed search for exact nearest neighbor search, anomaly detection using kernel density estimation, t-SNE visualization and SVM classification in both low and high dimensions, compared with distance, Gaussian and three other existing kernels. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. QuadScatter: Computational Efficiency in Simultaneous Transmissions for Large-Scale IoT Backscatter Networks
- Author
-
Ousmane Zeba, Kentaro Hayashi, Kazuhiro Kizaki, Shunsuke Saruwatari, and Takashi Watanabe
- Subjects
Computational cost ,simultaneous backscatter transmissions ,quadtree ,quadtree quadrant ,nearest neighbor search ,exhaustive search ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Backscatter communication has attracted attention owing to its ultra-low-power consumption ability, which is expected to enhance internet of things (IoT) technology that aims to enable many novel applications for object-to-object communication. Such a network with a large and continuously increasing number of connected objects will benefit significantly from resource-saving. This work introduces a system named QuadScatter, which is a set of algorithms that select and associate transmitters, tags and readers to enable simultaneous backscatter transmissions and increase network capacity. Consequently, the energy consumption in the network is considerably lessened. Intensive simulations have been conducted to demonstrate the effectiveness of backscatter simultaneous transmissions. QuadScatter shows promising results compared to the exhaustive search algorithm. The simulation results highlight computational time and simultaneous transmission improvements of at least $ 250\rm \times$ and $ 2\rm \times$, respectively. Furthermore, while the exhaustive search is limited to a few nodes ($ < 20$), our proposal uses numerous nodes. Additionally, an implementation of limited simultaneous backscatter transmissions is conducted to show its feasibility in the real world.
- Published
- 2021
- Full Text
- View/download PDF
26. Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB
- Author
-
Mateusz Banach
- Subjects
bioinformatics ,biological assembly ,bounding ellipsoid ,globularity ,kernel density ,nearest neighbor search ,Microbiology ,QR1-502 - Abstract
In this paper, we present an update to the ellipsoid profile algorithm (EP), a simple technique for the measurement of the globularity of protein structures without the calculation of molecular surfaces. The globularity property is understood in this context as the ability of the molecule to fill a minimum volume enclosing ellipsoid (MVEE) that approximates its assumed globular shape. The more of the interior of this ellipsoid is occupied by the atoms of the protein, the better are its globularity metrics. These metrics are derived from the comparison of the volume of the voxelized representation of the atoms and the volume of all voxels that can fit inside that ellipsoid (a uniform unit Å cube lattice). The so-called ellipsoid profile shows how the globularity changes with the distance from the center. Two of its values, the so-called ellipsoid indexes, are used to classify the structure as globular, semi-globular or non-globular. Here, we enhance the workflow of the EP algorithm via an improved outlier detection subroutine based on principal component analysis. It is capable of robust distinguishing between the dense parts of the molecules and, for example, disordered chain fragments fully exposed to the solvent. The PCA-based method replaces the current approach based on kernel density estimation. The improved EP algorithm was tested on 2124 representatives of domain superfamilies from SCOP 2.08. The second part of this work is dedicated to the survey of globularity of 3594 representatives of biological assemblies from molecules currently deposited in the PDB and analyzed by the 3DComplex database (monomers and complexes up to 60 chains).
- Published
- 2023
- Full Text
- View/download PDF
27. Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data.
- Author
-
López-Sánchez, Daniel, de Bodt, Cyril, Lee, John A., Arrieta, Angélica González, and Corchado, Juan M.
- Subjects
RANDOM matrices ,DIMENSIONAL reduction algorithms ,SEARCH algorithms - Abstract
Random Projection is one of the most popular and successful dimensionality reduction algorithms for large volumes of data. However, given its stochastic nature, different initializations of the projection matrix can lead to very different levels of performance. This paper presents a guided random search algorithm to mitigate this problem. The proposed method uses a small number of training data samples to iteratively adjust a projection matrix, improving its performance on similarly distributed data. Experimental results show that projection matrices generated with the proposed method result in a better preservation of distances between data samples. Conveniently, this is achieved while preserving the database-friendliness of the projection matrix, as it remains sparse and comprised exclusively of integers after being tuned with our algorithm. Moreover, running the proposed algorithm on a consumer-grade CPU requires only a few seconds. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Collaborative filtering recommendation based on local sensitive hash in blockchain environment.
- Author
-
WANG Jing and QIAN Xiao-dong
- Abstract
To solve the issue of low recommendation performance caused by massive high-dimensional data in the blockchain environment, the local sensitive hash algorithm is optimized to reduce the calculation and storage overhead in the nearest neighbor search process. The principal component of the data distribution is used to reduce the poorly captured projection direction in the traditional LSH. Meanwhile, the projection vector weight is quantified, the interval of the hash bucket is adjusted, and the query result set is further refined according to the number of conflicts. Finally, a weighted average strategy is used to predict the score and generate a recommendation list. Experiments show that, compared with other algorithm indexes, the optimized LSH only needs a small amount of hash tables and hash functions to obtain accurate neighbor search results, and the search efficiency is greatly improved. The optimized LSH can well adapt to the characteristics of blockchain data, alleviate the impact of high-dimensional large-scale data on recommendation performance, and improve the recommendation quality and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Delaunay walk for fast nearest neighbor: accelerating correspondence matching for ICP.
- Author
-
Anderson, James D., Raettig, Ryan M., Larson, Josh, Nykl, Scott L., Taylor, Clark N., and Wischgoll, Thomas
- Abstract
Point set registration algorithms such as Iterative Closest Point (ICP) are commonly utilized in time-constrained environments like robotics. Finding the nearest neighbor of a point in a reference 3D point set is a common operation in ICP and frequently consumes at least 90% of the computation time. We introduce a novel approach to performing the distance-based nearest neighbor step based on Delaunay triangulation. This greedy algorithm finds the nearest neighbor of a query point by traversing the edges of the Delaunay triangulation created from a reference 3D point set. Our work integrates the Delaunay traversal into the correspondences search of ICP and exploits the iterative aspect of ICP by caching previous correspondences to expedite each iteration. An algorithmic analysis and comparison is conducted showing an order of magnitude speedup for both serial and vector processor implementation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. In-Memory Nearest Neighbor Search With Nanoelectromechanical Ternary Content-Addressable Memory.
- Author
-
Lee, Jae Seong, Yoon, Jisoo, and Choi, Woo Young
- Subjects
ASSOCIATIVE storage ,HAMMING distance ,HYBRID integrated circuits ,ARTIFICIAL neural networks - Abstract
Nearest neighbor (NN) search is widely used in pattern classification and memory-augmented neural networks. To overcome the von Neumann bottleneck in conventional NN search architecture, in this study, nanoelectromechanical-switch-based ternary content-addressable memory (NEMTCAM) is introduced for the NN classifier. NEMTCAM can calculate the Hamming distance between the input vector and the stored vectors in a parallel search operation. The NEMTCAM operation was experimentally demonstrated. Furthermore, an analytical model for NN search accuracy, including cell-to-cell parasitic resistance, is presented. NEMTCAM can calculate up to 10 Hamming distances in 32-bit words owing to the high current ratio of the NEM memory switches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. A Fast PQ Hash Code Indexing
- Author
-
Shan, Jingsong, Zhang, Yongjun, Jiang, Mingxin, Jin, Chunhua, Zhang, Zhengwei, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Barolli, Leonard, editor, Xhafa, Fatos, editor, Javaid, Nadeem, editor, and Enokido, Tomoya, editor
- Published
- 2019
- Full Text
- View/download PDF
32. Fast Filtering for Nearest Neighbor Search by Sketch Enumeration Without Using Matching
- Author
-
Higuchi, Naoya, Imamura, Yasunobu, Kuboyama, Tetsuji, Hirata, Kouichi, Shinohara, Takeshi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Jixue, editor, and Bailey, James, editor
- Published
- 2019
- Full Text
- View/download PDF
33. Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines
- Author
-
Mu, Cun (Matthew), Zhao, Jun (Raymond), Yang, Guang, Yang, Binwei, Yan, Zheng (John), Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amato, Giuseppe, editor, Gennaro, Claudio, editor, Oria, Vincent, editor, and Radovanović, Miloš, editor
- Published
- 2019
- Full Text
- View/download PDF
34. Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search
- Author
-
Jääsaari, Elias, Hyvönen, Ville, Roos, Teemu, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Yang, Qiang, editor, Zhou, Zhi-Hua, editor, Gong, Zhiguo, editor, Zhang, Min-Ling, editor, and Huang, Sheng-Jun, editor
- Published
- 2019
- Full Text
- View/download PDF
35. Component Placement Process Optimization for Multi-head Surface Mounting Machine Using a Hybrid Algorithm.
- Author
-
Cheng-Jian Lin and Chun-Hui Lin
- Subjects
ASSEMBLY line balancing ,PROCESS optimization ,MIXED integer linear programming ,QUADRATIC assignment problem - Abstract
The article addresses the optimization problem of printed circuit board (PCB) assembly scheduling for surface mount placement (SMP) machines with multiple heads. Topics include studies have oversimplified the practical problems and limited their focus to three sub-problems; and propose a hybrid algorithm to consider the component height, the pick-and-place restrictions, and simultaneous pickup restrictions.
- Published
- 2021
- Full Text
- View/download PDF
36. Toward more efficient locality‐sensitive hashing via constructing novel hash function cluster.
- Author
-
Zhang, Shi, Huang, Jin, Xiao, Ruliang, Du, Xin, Gong, Ping, and Lin, Xinhong
- Subjects
MAXIMUM entropy method ,VECTOR quantization ,DISTRIBUTION (Probability theory) ,ALGORITHMS - Abstract
Locality‐sensitive hashing (LSH) is widely used in the context of nearest neighbor search of large‐scale high‐dimensions. However, there are serious imbalance problems between the efficiency of data index structure construction and the query accuracy of LSH methods. In this article, a novel higher‐entropy‐hyperplane clusters LSH (HEHC‐LSH) algorithm is proposed, which we improve vector quantization to preprocess the data and greatly shortens the preprocessing time; We innovatively integrate the maximum entropy principle into the distribution estimation algorithm to construct a novel hash function cluster method, also incorporate bootstrap aggregating of ensemble learning, and adopt the parallel index dictionary to improve the generalization performance of the index structure. And in the query stage, we realize the comprehensive filtering of index set using integrated learning idea, which not only avoids a lot of distance calculation, but also improves the quality of query results. We also analyze the rationality and effectiveness of the proposed method. Finally, extensive experiment results show that HEHC‐LSH can achieve more higher precision and efficiency simultaneously comparing to current methods, and reflect the strong robustness on different datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. Angular Deep Supervised Vector Quantization for Image Retrieval.
- Author
-
Zhou, Chang, Po, Lai Man, and Ou, Weifeng
- Subjects
- *
VECTOR quantization , *IMAGE retrieval , *CENTROID , *PROBLEM solving - Abstract
Most of the deep quantization methods adopt unsupervised approaches, and the quantization process usually occurs in the Euclidean space on top of the deep feature and its approximate value. When this approach is applied to the retrieval tasks, since the internal product space of the retrieval process is different from the Euclidean space of quantization, minimizing the quantization error (QE) does not necessarily lead to a good performance on the maximum inner product search (MIPS). To solve these problems, we treat Softmax classification as vector quantization (VQ) with angular decision boundaries and propose angular deep supervised VQ (ADSVQ) for image retrieval. Our approach can simultaneously learn the discriminative feature representation and the updatable codebook, both lying on a hypersphere. To reduce the QE between centroids and deep features, two regularization terms are proposed as supervision signals to encourage the intra-class compactness and inter-class balance, respectively. ADSVQ explicitly reformulates the asymmetric distance computation in MIPS to transform the image retrieval process into a two-stage classification process. Moreover, we discuss the extension of multiple-label cases from the perspective of quantization with binary classification. Extensive experiments demonstrate that the proposed ADSVQ has excellent performance on four well-known image data sets when compared with the state-of-the-art hashing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Contribution to the Understanding of Protein–Protein Interface and Ligand Binding Site Based on Hydrophobicity Distribution—Application to Ferredoxin I and II Cases.
- Author
-
Banach, Mateusz, Chomilier, Jacques, and Roterman, Irena
- Subjects
LIGAND binding (Biochemistry) ,BINDING sites ,PROTEIN-ligand interactions ,PROTEIN structure prediction ,PROTEIN-protein interactions ,PETROLEUM sales & prices - Abstract
Featured Application: hydrophobicity characteristic of proteins in the sense of fuzzy oil drop model may help predict residues engaged in protein–protein and protein–ligand interaction. Ferredoxin I and II are proteins carrying a specific ligand—an iron-sulfur cluster—which allows transport of electrons. These two classes of ferredoxin in their monomeric and dimeric forms are the object of this work. Characteristic of hydrophobic core in both molecules is analyzed via fuzzy oil drop model (FOD) to show the specificity of their structure enabling the binding of a relatively large ligand and formation of the complex. Structures of FdI and FdII are a promising example for the discussion of influence of hydrophobicity on biological activity but also for an explanation how FOD model can be used as an initial stage adviser (or a scoring function) in the search for locations of ligand binding pockets and protein–protein interaction areas. It is shown that observation of peculiarities in the hydrophobicity distribution present in the molecule (in this case—of a ferredoxin) may provide a promising starting location for computer simulations aimed at the prediction of quaternary structure of proteins. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering analysis.
- Author
-
Qaddoura, Raneem, Faris, Hossam, and Aljarah, Ibrahim
- Abstract
Evolutionary algorithms have shown their powerful capabilities in different machine learning problems including clustering which is a growing area of research nowadays. In this paper, we propose an efficient clustering technique based on the evolution behavior of genetic algorithm and an advanced variant of nearest neighbor search technique based on assignment and election mechanisms. The goal of the proposed algorithm is to improve the quality of clustering results by finding a solution that maximizes the separation between different clusters and maximizes the cohesion between data points in the same cluster. Our proposed algorithm which we refer to as "EvoNP" is tested with 15 well-known data sets using 5 well-known external evaluation measures and is compared with 7 well-regarded clustering algorithms. The experiments are conducted in two phases: evaluation of the best fitness function for the algorithm and evaluation of the algorithm against other clustering algorithms. The results show that the proposed algorithm works well with silhouette coefficient fitness function and outperforms the other algorithms for the majority of the data sets. The source code of EvoNP is available at http://evo-ml.com/evonp/. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
40. Damage detection of 3D structures using nearest neighbor search method.
- Author
-
Abasi, Ali, Harsij, Vahid, and Soraghi, Ahmad
- Subjects
- *
NEAREST neighbor analysis (Statistics) , *ARTIFICIAL neural networks , *RANDOM noise theory , *PRINCIPAL components analysis , *WHITE noise - Abstract
An innovative damage identification method using the nearest neighbor search method to assess 3D structures is presented. The frequency response function was employed as the input parameters to detect the severity and place of damage in 3D spaces since it includes the most dynamic characteristics of the structures. Two-dimensional principal component analysis was utilized to reduce the size of the frequency response function data. The nearest neighbor search method was employed to detect the severity and location of damage in different damage scenarios. The accuracy of the approach was verified using measured data from an experimental test; moreover, two asymmetric 3D numerical examples were considered as the numerical study. The superiority of the method was demonstrated through comparison with the results of damage identification by using artificial neural network. Different levels of white Gaussian noise were used for polluting the frequency response function data to investigate the robustness of the methods against noise-polluted data. The results indicate that both methods can efficiently detect the damage properties including its severity and location with high accuracy in the absence of noise, but the nearest neighbor search method is more robust against noisy data than the artificial neural network. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
41. Fast Nearest Neighbor Search Based on Approximate k-NN Graph
- Author
-
Yang, Jie, Zhao, Wan-Lei, Deng, Cheng-Hao, Wang, Hanzi, Moon, Sangwhan, Barbosa, Simone Diniz Junqueira, Series Editor, Chen, Phoebe, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Huet, Benoit, editor, Nie, Liqiang, editor, and Hong, Richang, editor
- Published
- 2018
- Full Text
- View/download PDF
42. Efficient and Secure Nearest Neighbor Search Over Encrypted Data in Cloud Environment
- Author
-
Upinder, Kaur, Suri Pushpa, R., Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Chen, Phoebe, Founding Editor, Sivalingam, Krishna M., Founding Editor, Washio, Takashi, Founding Editor, Yuan, Junsong, Founding Editor, Panda, Brajendra, editor, Sharma, Sudeep, editor, and Roy, Nihar Ranjan, editor
- Published
- 2018
- Full Text
- View/download PDF
43. Nearest Neighbor Search Techniques Applied in the Nearest Feature Line Classifier
- Author
-
Guo, Fang, Pan, Jeng-Shyang, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Krömer, Pavel, editor, Alba, Enrique, editor, Pan, Jeng-Shyang, editor, and Snášel, Václav, editor
- Published
- 2018
- Full Text
- View/download PDF
44. An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing
- Author
-
Itotani, Yuri, Wakabayashi, Shin’ichi, Nagayama, Shinobu, Inagi, Masato, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Hartmann, Sven, editor, Ma, Hui, editor, Hameurlain, Abdelkader, editor, Pernul, Günther, editor, and Wagner, Roland R., editor
- Published
- 2018
- Full Text
- View/download PDF
45. A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm for High-Dimensional Data
- Author
-
Wang, Xia Li, Wang, Xiaochun, Li, Xiaqiong, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, and Perner, Petra, editor
- Published
- 2018
- Full Text
- View/download PDF
46. An Efficient Exact Nearest Neighbor Search by Compounded Embedding
- Author
-
Li, Mingjie, Zhang, Ying, Sun, Yifang, Wang, Wei, Tsang, Ivor W., Lin, Xuemin, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pei, Jian, editor, Manolopoulos, Yannis, editor, Sadiq, Shazia, editor, and Li, Jianxin, editor
- Published
- 2018
- Full Text
- View/download PDF
47. Learning to Index in Large-Scale Datasets
- Author
-
Prayoonwong, Amorntip, Wang, Cheng-Hsien, Chiu, Chih-Yi, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Schoeffmann, Klaus, editor, Chalidabhongse, Thanarat H., editor, Ngo, Chong Wah, editor, Aramvith, Supavadee, editor, O’Connor, Noel E., editor, Ho, Yo-Sung, editor, Gabbouj, Moncef, editor, and Elgammal, Ahmed, editor
- Published
- 2018
- Full Text
- View/download PDF
48. A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs.
- Author
-
Tellez, Eric S., Ruiz, Guillermo, Chavez, Edgar, and Graff, Mario
- Subjects
- *
K-nearest neighbor classification , *PROBLEM solving , *METAHEURISTIC algorithms , *COMBINATORIAL optimization - Abstract
Nearest neighbor search is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query q by minimizing a kernel function defined by the distance from q to each object in the database. The minimization is performed using metaheuristics to solve the problem rapidly; even when some methods in the literature use this strategy behind the scenes, our approach is the first one using it explicitly. We also provide two approaches to select edges in the graph's construction stage that limit memory footprint and reduce the number of free parameters simultaneously. We carry out a thorough experimental comparison with state-of-the-art indexes through synthetic and real-world datasets; we found out that our contributions achieve competitive performances regarding speed, accuracy, and memory in almost any of our benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
49. Data efficient indoor thermal comfort prediction using instance based transfer learning method.
- Author
-
Li, Kangji, Liu, Yufei, Chen, Lei, and Xue, Wenping
- Abstract
The precise prediction of indoor thermal sensation is critical to building energy efficiency and satisfaction of occupants. In this field, data-driven models have shown superior performance than Predicted Mean Vote (PMV) model. However, a reliable and accurate model of this kind relies on sufficient and high-quality data from physiological and environmental sensing, which in practice is difficult or expensive to collect. Rather than developing a separated predictive model for certain task, transfer learning method could intelligently use knowledge from similar domains to solve the target problem, and thus the requirement of data resources is reduced. In the last few years, transfer learning has been introduced into the field of thermal comfort prediction. Most of them used the source domain data as a whole for model pre-training, and the samples' suitability judgment was not fully considered. But the general situation in this field is: there are few public and available data resources and their data distributions are hardly consistent with that of specific target task. Aiming at the issue, this study proposes an instance based transfer learning model, which is based on data weighting and reusing for data efficient thermal comfort prediction. We assume that samples from source and target domains obey different distributions but have some common features that can be transferred across domains. The Nearest Neighbor Search (NNS) is used for similarity judgement and selection of source domain data; A Boosting type transfer learning algorithm, i.e., TrAdaBoost, is improved for reliable thermal comfort prediction. In the algorithm, an automatic weighting mechanism could optimally adjust the source training data's weights for performance improvement. By using field experimental data set and a public data set as target and source domains data respectively, the performance of the proposed model is investigated. Results show when target domain data is insufficient, the proposed NNS-iTrAdaBoost method has superior transfer learning ability and could achieve much better performance than traditional data-driven models. In practice, the proposed method is easy for engineering implementation and has great potential for thermal comfort prediction based on small-scale experiments. • An improved transfer learning algorithm, called iTrAdaBoost, is designed for indoor thermal comfort prediction. • The Nearest Neighbor Search method is used for similarity judgement and selection of source domain data. • A field thermal comfort experiment in an office room is conducted for target domain data collection. • The performance of proposed method is investigated and compared with other data-driven models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. SWINN: Efficient nearest neighbor search in sliding windows using graphs.
- Author
-
Mastelini, Saulo Martiello, Veloso, Bruno, Halford, Max, de Carvalho, André Carlos Ponce de Leon Ferreira, and Gama, João
- Subjects
- *
MACHINE learning , *SEARCH algorithms , *CLASSIFICATION algorithms , *ONLINE algorithms , *ONLINE education , *ELECTRONIC information resource searching - Abstract
Nearest neighbor search (NNS) is one of the main concerns in data stream applications since similarity queries can be used in multiple scenarios. Online NNS is usually performed on a sliding window by lazily scanning every element currently stored in the window. This paper proposes Sliding Window-based Incremental Nearest Neighbors (SWINN), a graph-based online search index algorithm for speeding up NNS in potentially never-ending and dynamic data stream tasks. Our proposal broadens the application of online NNS-based solutions, as even moderately large data buffers become impractical to handle when a naive NNS strategy is selected. SWINN enables efficient handling of large data buffers by using an incremental strategy to build and update a search graph supporting any distance metric. Vertices can be added and removed from the search graph. To keep the graph reliable for search queries, lightweight graph maintenance routines are run. According to experimental results, SWINN is significantly faster than performing a naive complete scan of the data buffer while keeping competitive search recall values. We also apply SWINN to online classification and regression tasks and show that our proposal is effective against popular online machine learning algorithms. • We introduce SWINN, a graph-based nearest neighbor search index for sliding windows. • SWINN works with any distance metric and is robust against concept drift. • SWINN is significantly faster than a linear scan of the data buffer. • Our proposal delivers competitive search recalls compared to the traditional strategy. • SWINN is successfully applied to classification and regression case studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.