1. Effective hierarchical clustering based on structural similarities in nearest neighbor graphs
- Author
-
Qinglan Peng, Kenji Leibnitz, Chunrong Wu, Yunni Xia, and Jia Lee
- Subjects
Information Systems and Management ,Computer science ,business.industry ,Rand index ,Pattern recognition ,Management Information Systems ,k-nearest neighbors algorithm ,Hierarchical clustering ,ComputingMethodologies_PATTERNRECOGNITION ,Similarity (network science) ,Nearest neighbor graph ,Artificial Intelligence ,Metric (mathematics) ,Cluster (physics) ,Artificial intelligence ,Cluster analysis ,business ,Software - Abstract
Hierarchical clustering allows better performance in grouping heterogeneous and non-spherical datasets than the center-based clustering, at the expense of increased time complexity. Meanwhile, the bottom-up approach of hierarchical clustering methods often tend to be sensitive or vulnerable to datasets containing obscure cluster boundaries. This paper presents an effective method for hierarchical clustering, called HCNN, which utilizes two types of structural similarities in nearest neighbor graph of a dataset to group similar data into clusters. In particular, the first metric is used to identify those pairs of data with maximal similarity in their nearest neighborhoods that can serve as local centers of every initial cluster. This can contribute to observing the boundaries and detecting clusters, hubs and outliers, thereby alleviating remarkably the influence of obscure boundaries between clusters. The initial clusters will be merged recursively in accordance with the second similarity metric measured in terms of the connectivity between clusters in the nearest neighbor graph. In this case, rather than combining two clusters with highest similarity during each iteration, we consider the maximum similarity as a transitive and closure relation between clusters, i.e., an equivalence relation, which enables more effective and efficient merging of clusters via application of advanced data structures. Experiments based on synthetic and real datasets demonstrate that the proposed HCNN algorithms can possibly outperform the state-of-the-art clustering methods evaluated in terms of the clustering accuracy, normalized mutual information and adjust rand index. Moreover, experiments on unlabeled real datasets show that our method enables effective identification of the quantity of clusters based on Davies–Bouldin index and average silhouette coefficient.
- Published
- 2021
- Full Text
- View/download PDF