Back to Search Start Over

Robust clustering by identifying the veins of clusters based on kernel density estimation.

Authors :
Zhou, Zhou
Si, Gangquan
Zhang, Yanbin
Zheng, Kai
Source :
Knowledge-Based Systems. Nov2018, Vol. 159, p309-320. 12p.
Publication Year :
2018

Abstract

Highlights • A robust clustering algorithm(IVDPC) is proposed to solve the ”chain reaction“ and cut off distance selecting problems of DPC. • A new similarity coefficient is introduced to represent the relevance between the points which is an extension of γ defined in DPC. • The local density is estimated through a non-parametric density estimation method so as to eliminate the reliance of user-defined parameter dc. • Clusters are characterized by veins rather than one representative point, which allows IVDPC to identify the main structure of clusters more visualized and precise. • The robustness of the algorithm with respect to the choice of input parameters is proved via statistical method. Abstract Clustering by fast search and find of density peaks(DPC) was an efficient clustering algorithm proposed by Rodriguez and Laio [49]. It adopts a concise but effective categorizing strategy which assigns data points to the same cluster as their nearest neighbors with higher densities. However, it suffers from the so-called “chain reaction” due to the simplistic strategy. What’s more, the accuracy of DPC badly depends on the selection of cut off distance d c when the data scale ranges. In order to take advantage of DPC whilst avoiding the drawbacks aforementioned, this paper proposed a robust clustering algorithm named IVDPC which provides a feasible approach for solving the classification problem of data with different shape and distribution. The local density is estimated through a non-parametric density estimation method first. Then, by calculating the similarity matrix of points and connecting the most resembled pairs continuously from high density regions to the edge of clusters, IVDPC identifies the main structure(veins) of clusters and classifies the rest of the samples precisely to the nearest vein. Having veins rather than one representative point to represent a cluster allows IVDPC to adjust well to the geometry of non-spherical shapes and decrease the chain reaction of DPC. The method proposed is benchmarked on artificial and real-world data sets against several baseline methods. The experimental results demonstrate that IVDPC can recognize the structure distribution of clusters and perform better in clustering accuracy over several state-of-art algorithms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09507051
Volume :
159
Database :
Academic Search Index
Journal :
Knowledge-Based Systems
Publication Type :
Academic Journal
Accession number :
131729410
Full Text :
https://doi.org/10.1016/j.knosys.2018.06.021