Back to Search Start Over

A Variant of the K-Means Clustering Algorithm for Continuous-Nominal Data

Authors :
Michal Grabowski
Aleksander Denisiuk
Source :
Advances in Intelligent Systems and Computing ISBN: 9783319262253, CORES
Publication Year :
2016
Publisher :
Springer International Publishing, 2016.

Abstract

The core idea of the proposed algorithm is to embed the considered dataset into a metric space. Two spaces for embedding of nominal part with the Hamming metric are considered: Euclidean space (the classical approach) and the standard unit sphere \(\mathbb S\) (our new approach). We proved that the distortion of embedding into the unit sphere is at least 75 % better than that of the classical approach. In our model, combinations of continuous and nominal data are interpreted as points of a cylinder \(\mathbb R^p\times \mathbb S\), where p is the dimension of continuous data. We use a version of the gradient algorithm to compute centroids of finite sets on a cylinder. Experimental results show certain advances of the new algorithm. Specifically, it produces better clusters in tests with predefined groups.

Details

ISBN :
978-3-319-26225-3
ISBNs :
9783319262253
Database :
OpenAIRE
Journal :
Advances in Intelligent Systems and Computing ISBN: 9783319262253, CORES
Accession number :
edsair.doi...........b2803bb4f49b88733a708e4f3a6fc9ad
Full Text :
https://doi.org/10.1007/978-3-319-26227-7_2