Back to Search
Start Over
DenPEHC: Density peak based efficient hierarchical clustering
- Source :
- Information Sciences. 373:200-218
- Publication Year :
- 2016
- Publisher :
- Elsevier BV, 2016.
-
Abstract
- Existing hierarchical clustering algorithms involve a flat clustering component and an additional agglomerative or divisive procedure. This paper presents a density peak based hierarchical clustering method (DenPEHC), which directly generates clusters on each possible clustering layer, and introduces a grid granulation framework to enable DenPEHC to cluster large-scale and high-dimensional (LSHD) datasets. This study consists of three parts: (1) utilizing the distribution of the parameter γ , which is defined as the product of the local density ρ and the minimal distance to data points with higher density δ in “clustering by fast search and find of density peaks” (DPClust), and a linear fitting approach to select clustering centers with the clustering hierarchy decided by finding the “stairs” in the γ curve; (2) analyzing the leading tree (in which each node except the root is led by its parent to join the same cluster) as an intermediate result of DPClust, and constructing the clustering hierarchy efficiently based on the tree; and (3) designing a framework to enable DenPEHC to cluster LSHD datasets when a large number of attributes can be grouped by their semantics. The proposed method builds the clustering hierarchy by simply disconnecting the center points from their parents with a linear computational complexity O ( m ), where m is the number of clusters. Experiments on synthetic and real datasets show that the proposed method has promising efficiency, accuracy and robustness compared to state-of-the-art methods.
- Subjects :
- 0209 industrial biotechnology
Information Systems and Management
Fuzzy clustering
Single-linkage clustering
Correlation clustering
02 engineering and technology
computer.software_genre
Complete-linkage clustering
Computer Science Applications
Theoretical Computer Science
Hierarchical clustering
ComputingMethodologies_PATTERNRECOGNITION
020901 industrial engineering & automation
Artificial Intelligence
Control and Systems Engineering
CURE data clustering algorithm
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
Cluster analysis
computer
Algorithm
Software
k-medians clustering
Mathematics
Subjects
Details
- ISSN :
- 00200255
- Volume :
- 373
- Database :
- OpenAIRE
- Journal :
- Information Sciences
- Accession number :
- edsair.doi...........a6b55881b4bad5055c2c54c77bbb693a
- Full Text :
- https://doi.org/10.1016/j.ins.2016.08.086