Back to Search
Start Over
Robust and sparse k-means clustering for high-dimensional data
- Source :
- Advances in Data Analysis and Classification.
- Publication Year :
- 2019
- Publisher :
- Springer Science and Business Media LLC, 2019.
-
Abstract
- We introduce a robust k-means-based clustering method for high-dimensional data where not only outliers but also a large number of noise variables are very likely to be present [4]. Although Kondo et al. [2] already addressed such an application scenario, our approach goes even further. Firstly, the introduced method is designed to identify clusters, informative variables, and outliers simultaneously. Secondly, the proposed clustering technique additionally aims at optimizing required parameters, e.g. the number of clusters. This is a great advantage over most existing methods. Moreover, the robustness aspect is achieved through a robust initialization [3] and a proposed weighting function using the Local Outlier Factor [1]. The weighting function provides a valuable source of information about the outlyingness of each observation for a subsequent outlier detection. In order to reveal both clusters and informative variables properly, the approach uses a lasso-type penalty [5]. The method has thoroughly been tested on simulated as well as on real high-dimensional datasets. The conducted experiments demonstrated a great ability of the clustering method to identify clusters, outliers, and informative variables.
- Subjects :
- FOS: Computer and information sciences
Statistics and Probability
Clustering high-dimensional data
Computer science
ЕСТЕСТВЕННЫЕ И ТОЧНЫЕ НАУКИ::Кибернетика [ЭБ БГУ]
02 engineering and technology
A-weighting
01 natural sciences
Methodology (stat.ME)
010104 statistics & probability
0202 electrical engineering, electronic engineering, information engineering
0101 mathematics
Cluster analysis
Statistics - Methodology
Statistic
business.industry
Applied Mathematics
k-means clustering
Pattern recognition
Function (mathematics)
Computer Science Applications
Noise
ComputingMethodologies_PATTERNRECOGNITION
Outlier
020201 artificial intelligence & image processing
Artificial intelligence
ЕСТЕСТВЕННЫЕ И ТОЧНЫЕ НАУКИ::Математика [ЭБ БГУ]
business
Subjects
Details
- ISSN :
- 18625355 and 18625347
- Database :
- OpenAIRE
- Journal :
- Advances in Data Analysis and Classification
- Accession number :
- edsair.doi.dedup.....18bc50e0725615fc46359c7b9c847ccf
- Full Text :
- https://doi.org/10.1007/s11634-019-00356-9