Back to Search Start Over

Robust and sparse k-means clustering for high-dimensional data

Authors :
Sarka Brodinova
Maia Rohm
Christian Breiteneder
Thomas Ortner
Peter Filzmoser
Source :
Advances in Data Analysis and Classification.
Publication Year :
2019
Publisher :
Springer Science and Business Media LLC, 2019.

Abstract

We introduce a robust k-means-based clustering method for high-dimensional data where not only outliers but also a large number of noise variables are very likely to be present [4]. Although Kondo et al. [2] already addressed such an application scenario, our approach goes even further. Firstly, the introduced method is designed to identify clusters, informative variables, and outliers simultaneously. Secondly, the proposed clustering technique additionally aims at optimizing required parameters, e.g. the number of clusters. This is a great advantage over most existing methods. Moreover, the robustness aspect is achieved through a robust initialization [3] and a proposed weighting function using the Local Outlier Factor [1]. The weighting function provides a valuable source of information about the outlyingness of each observation for a subsequent outlier detection. In order to reveal both clusters and informative variables properly, the approach uses a lasso-type penalty [5]. The method has thoroughly been tested on simulated as well as on real high-dimensional datasets. The conducted experiments demonstrated a great ability of the clustering method to identify clusters, outliers, and informative variables.

Details

ISSN :
18625355 and 18625347
Database :
OpenAIRE
Journal :
Advances in Data Analysis and Classification
Accession number :
edsair.doi.dedup.....18bc50e0725615fc46359c7b9c847ccf
Full Text :
https://doi.org/10.1007/s11634-019-00356-9