Back to Search
Start Over
A Variant of the K-Means Clustering Algorithm for Continuous-Nominal Data
- Source :
- Advances in Intelligent Systems and Computing ISBN: 9783319262253, CORES
- Publication Year :
- 2016
- Publisher :
- Springer International Publishing, 2016.
-
Abstract
- The core idea of the proposed algorithm is to embed the considered dataset into a metric space. Two spaces for embedding of nominal part with the Hamming metric are considered: Euclidean space (the classical approach) and the standard unit sphere \(\mathbb S\) (our new approach). We proved that the distortion of embedding into the unit sphere is at least 75 % better than that of the classical approach. In our model, combinations of continuous and nominal data are interpreted as points of a cylinder \(\mathbb R^p\times \mathbb S\), where p is the dimension of continuous data. We use a version of the gradient algorithm to compute centroids of finite sets on a cylinder. Experimental results show certain advances of the new algorithm. Specifically, it produces better clusters in tests with predefined groups.
- Subjects :
- Unit sphere
Discrete mathematics
Euclidean space
010102 general mathematics
Correlation clustering
Dimension (graph theory)
02 engineering and technology
01 natural sciences
Metric space
CURE data clustering algorithm
0202 electrical engineering, electronic engineering, information engineering
Embedding
020201 artificial intelligence & image processing
0101 mathematics
k-medians clustering
Mathematics
Subjects
Details
- ISBN :
- 978-3-319-26225-3
- ISBNs :
- 9783319262253
- Database :
- OpenAIRE
- Journal :
- Advances in Intelligent Systems and Computing ISBN: 9783319262253, CORES
- Accession number :
- edsair.doi...........b2803bb4f49b88733a708e4f3a6fc9ad
- Full Text :
- https://doi.org/10.1007/978-3-319-26227-7_2