5 results on '"Yueguo Luo"'
Search Results
2. Location difference of multiple distances based k-nearest neighbors algorithm
- Author
-
Guanghua Zhang, Yueguo Luo, Li-mei Dong, Zhongyang Xiong, and Shuyin Xia
- Subjects
Clustering high-dimensional data ,Information Systems and Management ,Computer science ,business.industry ,Pattern recognition ,Search tree ,Management Information Systems ,k-nearest neighbors algorithm ,Tree (data structure) ,Tree structure ,Artificial Intelligence ,Search algorithm ,Artificial intelligence ,business ,Time complexity ,Algorithm ,Software - Abstract
The "location difference of multiple distances" and a method LDMDBA are proposed.LDMDBA has a time complexity of O(logdnlogn) and does not rely on tree structures.Only LDMDBA can be efficiently applied to high dimensional data.LDMDBA has a time complexity of (logdlogn) for predicting a new data point.LDMDBA has very good stability and can be applied to large databases. k-nearest neighbors (kNN) classifiers are commonly used in various applications due to their relative simplicity and the absence of necessary training. However, the time complexity of the basic algorithm is quadratic, which makes them inappropriate for large scale datasets. At the same time, the performance of most improved algorithms based on tree structures decreases rapidly with increase in dimensionality of dataset, and tree structures have different complexity in different datasets. In this paper, we introduce the concept of "location difference of multiple distances, and use it to measure the difference between different data points. In this way, location difference of multiple distances based nearest neighbors searching algorithm (LDMDBA) is proposed. LDMDBA has a time complexity of O(logdnlogn) and does not rely on a search tree. This makes LDMDBA the only kNN method that can be efficiently applied to high dimensional data and has very good stability on different datasets. In addition, most of the existing methods have a time complexity of O(n) to predict a data point outside the dataset. By contrast, LDMDBA has a time complexity of O(logdlogn) to predict a query point in datasets of different dimensions, and, therefore, can be applied in real systems and large scale databases. The effectiveness and efficiency of LDMDBA are demonstrated in experiments involving public and artificial datasets.
- Published
- 2015
3. Effectiveness of the Euclidean distance in high dimensional spaces
- Author
-
Shuyin Xia, Zhongyang Xiong, WeiXu, Guanghua Zhang, and Yueguo Luo
- Subjects
Euclidean distance ,Mahalanobis distance ,Distance matrix ,Distance from a point to a plane ,Mathematical analysis ,Minkowski distance ,Electrical and Electronic Engineering ,Euclidean distance matrix ,Weighted Voronoi diagram ,Atomic and Molecular Physics, and Optics ,Distance from a point to a line ,Electronic, Optical and Magnetic Materials ,Mathematics - Abstract
This paper presents analysis of applicability and performance of the Euclidean distance in relation to the dimensionality of the space. The effect of dimensionality on the behavior of Euclidean distance is explored; Furthermore, it is shown that the minimum distance approaches the maximum distance under a broader set of conditions without requiring the calculation of variance of random variables. It is demonstrated that the minimum distance approaches the maximum distance even for some low dimensional distributions, such as normal distribution. Many proposed measures not based directly on Euclidean distance cannot enlarger the difference between closest point and farthest point. The analysis has been performed on a wide range of artificial and publicly available datasets. As the variables of different distributions have different convergence rates, the results should not be interpreted to mean that the Euclidean distance is not applicable. In fact, it is shown in experiments that the Euclidean distance is very useful in the noncentral t-distribution even for the dimensionality higher than 10,000. Furthermore, it is observed that the behavior of Euclidean distance becomes more useful with increased number of samples.
- Published
- 2015
4. A method to improve support vector machine based on distance to hyperplane
- Author
-
Zhongyang Xiong, Yueguo Luo, Shuyin Xia, and Li-mei Dong
- Subjects
Training set ,Computer science ,Process (computing) ,Atomic and Molecular Physics, and Optics ,Electronic, Optical and Magnetic Materials ,Support vector machine ,Kernel (linear algebra) ,Kernel method ,Hyperplane ,Dimension (vector space) ,Ranking SVM ,Electrical and Electronic Engineering ,Algorithm ,Time complexity - Abstract
As SVM (support vector machine) has good generalizability, it has been successfully implemented in a variety of applications. Yet in the process of resolving its mathematical model, SVM needs to compute the kernel matrix. The dimension of the kernel matrix is equal to the number of records in the training set, so computing it is very costly in terms of memory. Although some improved algorithms have been proposed to decrease the need for memory, most of these algorithms need iterative computations that cost too much time. Since the existing SVM models fail to perform well regarding both runtime and space needed, we propose a new method to decrease the memory consumption without the need for any iteration. In the method, an effective measure in kernel space is proposed to extract a subset of the database that includes the support vectors. In this way, the number of samples participating in the training process decreases, resulting in an accelerated training process which has a time complexity of only O(nlogn). Another advantage of this method is that it can be used in conjunction with other SVM methods. The experiments demonstrate effectiveness and efficiency of SVM algorithms that are enhanced with the proposed method.
- Published
- 2015
5. Relative density based support vector machine
- Author
-
Li-mei Dong, Yueguo Luo, Zhongyang Xiong, Shuyin Xia, and Changyuan Xing
- Subjects
Cognitive Neuroscience ,Feature vector ,Boundary (topology) ,computer.software_genre ,Cross-validation ,Computer Science Applications ,Support vector machine ,Kernel method ,Data point ,Dimension (vector space) ,Artificial Intelligence ,Data mining ,Algorithm ,computer ,Time complexity ,Mathematics - Abstract
As a support vector machine (SVM) has good generalization ability, it has been implemented in various applications. Yet in the process of resolving the mathematical model, it needs to compute the kernel matrix, the dimension of which is equal to the number of data points in the dataset, thereby costing a very high amount of memory. Some improved algorithms are proposed to extract the boundary of the dataset so that the number of data points participating in the training process decreases and the training process can be accelerated. But the prediction accuracy of most of these algorithms is so low that many support vectors are discarded. Moreover, those methods all need to perform the main computation by the kernel function in linear feature space, which increases the computational cost. In this paper, the concept “relative density” is proposed to extract the subset containing support vectors. The measure “relative density” is designed to be more meticulous so that the new method performs more precisely than existing methods. The proposed method makes use of the fact that it has good local characteristics to perform the computations in original space without having to use any kernel function. Therefore, efficiency is also improved. Furthermore, the proposed method can be used to detect noise data, by which an inseparable problem can be transformed into a separable problem so that cross validation can be avoided in various SVM algorithms. This is an advantage that none of the existing SVM methods has. Yet another advantage of this method is that it can be considered as a framework to be used in various SVM methods. This paper presents the details of the proposed accelerated algorithm, having a time complexity of O(n log n), that decreases training time significantly without decreasing prediction accuracy. The effectiveness and efficiency of the method is demonstrated through experiments on artificial and public datasets.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.