Back to Search Start Over

Robust Model Design by Comparative Evaluation of Clustering Algorithms

Authors :
Xiaopeng Chen
Chanseok Park
Xuehong Gao
Bosung Kim
Source :
IEEE Access, Vol 11, Pp 88135-88151 (2023)
Publication Year :
2023
Publisher :
IEEE, 2023.

Abstract

The K-means algorithm, widely used in cluster analysis, is a centroid-based clustering method known for its high efficiency and scalability. However, in realistic situations, the operating environment is susceptible to contamination issues caused by outliers and distribution departures, which may lead to clustering results from K-means that are distorted or rendered invalid. In this paper, we introduce three other alternative algorithms, including K-weighted-medians, K-weighted-L2-medians, and K-weighted-HLs, to address these issues under the consideration of data with weights. The impact of contamination is investigated by examining the estimation effects on optimal cluster centroids. We explore the robustness of the clustering algorithms from the perspective of the breakdown point, and then conduct experiments on simulated and real datasets to evaluate their performance using two new numerical metrics: relative efficiencies based on generalized variance and average Euclidean distance. The results demonstrate the effectiveness of the proposed K-weighted-HLs algorithm, surpassing other algorithms in scenarios involving both contamination issues.

Details

Language :
English
ISSN :
21693536
Volume :
11
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.54ae6baa35a046328bcfd8835813d8a4
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2023.3306023