Back to Search Start Over

WeDIV – An improved k-means clustering algorithm with a weighted distance and a novel internal validation index

Authors :
Zilan Ning
Jin Chen
Jianjun Huang
Umar Jlbrilla Sabo
Zheming Yuan
Zhijun Dai
Source :
Egyptian Informatics Journal, Vol 23, Iss 4, Pp 133-144 (2022)
Publication Year :
2022
Publisher :
Elsevier, 2022.

Abstract

Designing appropriate similarity metrics (distance) and estimating the optimal number of clusters have been two important issues in cluster analysis. This study proposed an improved k-means clustering algorithm involving a Weighted Distance and a novel Internal Validation index (WeDIV). The weighted distance, EP_dis, was designed by considering the relative contribution between Euclidean and Pearson distances with a weighted strategy. This strategy can effectively capture information reflecting the globally spatial correlation and locally variable trend simultaneously in high-dimensional space. The new internal validation index,RCH, inspired by the Calinski-Harabasz (CH) index and the analysis of variance, was developed to automatically estimate the optimal number of clusters. The EP_dis was proved reliable in mathematics and was validated on two simulated datasets. Four simulated datasets representing different properties were used to validate the effectiveness of RCH. Furthermore, We compared the clustering performance of WeDIV with 12 prevailing clustering algorithms on 16 UCI datasets. The results demonstrated that WeDIV outperforms the others regardless of specifying the number of clusters or not.

Details

Language :
English
ISSN :
11108665
Volume :
23
Issue :
4
Database :
Directory of Open Access Journals
Journal :
Egyptian Informatics Journal
Publication Type :
Academic Journal
Accession number :
edsdoj.6177a6bf6042aebbefd865e11470d7
Document Type :
article
Full Text :
https://doi.org/10.1016/j.eij.2022.09.002