Back to Search Start Over

Model-Based Clustering of High-Dimensional Data: A review

Authors :
Camille Brunet-Saumard
Charles Bouveyron
Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145)
Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS)
Laboratoire Angevin de Recherche en Mathématiques (LAREMA)
Université d'Angers (UA)-Centre National de la Recherche Scientifique (CNRS)
Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 )
Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS )
Laboratoire Angevin de REcherche en MAthématiques ( LAREMA )
Université d'Angers ( UA ) -Centre National de la Recherche Scientifique ( CNRS )
Source :
Computational Statistics and Data Analysis, Computational Statistics and Data Analysis, Elsevier, 2013, 71, pp.52-78. ⟨10.1016/j.csda.2012.12.008⟩, Computational Statistics and Data Analysis, Elsevier, 2013, 71, pp.52-78. 〈10.1016/j.csda.2012.12.008〉
Publication Year :
2013
Publisher :
HAL CCSD, 2013.

Abstract

International audience; Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-parametrized in this case. However, high-dimensional spaces have specific characteristics which are useful for clustering and recent techniques exploit those characteristics. After having recalled the bases of model-based clustering, this article will review dimension reduction approaches, regularization-based techniques, parsimonious modeling, subspace clustering methods and clustering methods based on variable selection. Existing softwares for model-based clustering of high-dimensional data will be also reviewed and their practical use will be illustrated on real-world data sets.

Details

Language :
English
ISSN :
01679473
Database :
OpenAIRE
Journal :
Computational Statistics and Data Analysis, Computational Statistics and Data Analysis, Elsevier, 2013, 71, pp.52-78. ⟨10.1016/j.csda.2012.12.008⟩, Computational Statistics and Data Analysis, Elsevier, 2013, 71, pp.52-78. 〈10.1016/j.csda.2012.12.008〉
Accession number :
edsair.doi.dedup.....b2435c6afab90308576bf886c5fb9c09
Full Text :
https://doi.org/10.1016/j.csda.2012.12.008⟩