Back to Search Start Over

Clustering strategy and method selection

Authors :
Hennig, Christian
Publication Year :
2015

Abstract

This paper is a chapter in the forthcoming Handbook of Cluster Analysis, Hennig et al. (2015). For definitions of basic clustering methods and some further methodology, other chapters of the Handbook are referred to. To read this version of the paper without the Handbook, some knowledge of cluster analysis methodology is required. The aim of this chapter is to provide a framework for all the decisions that are required when carrying out a cluster analysis in practice. A general attitude to clustering is outlined, which connects these decisions closely to the clustering aims in a given application. From this point of view, the chapter then discusses aspects of data processing such as the choice of the representation of the objects to be clustered, dissimilarity design, transformation and standardization of variables. Regarding the choice of the clustering method, it is explored how different methods correspond to different clustering aims. Then an overview of benchmarking studies comparing different clustering methods is given, as well as an out- line of theoretical approaches to characterize desiderata for clustering by axioms. Finally, aspects of cluster validation, i.e., the assessment of the quality of a clustering in a given dataset, are discussed, including finding an appropriate number of clusters, testing homogeneity, internal and external cluster validation, assessing clustering stability and data visualization.

Subjects

Subjects :
Statistics - Methodology

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.1503.02059
Document Type :
Working Paper