Start Over

A tractable multi-partitions clustering

Authors :: Matthieu Marbac
Vincent Vandewalle
CHU Lille
Université de Lille
METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 [METRICS]
Centre de Recherche en Economie et Statistique [Bruz] (CREST)
Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz] (ENSAI)
MOdel for Data Analysis and Learning (MODAL)
Laboratoire Paul Painlevé (LPP)
Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS)
Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille)
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS)
Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)
Inria Lille - Nord Europe
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Paul Painlevé - UMR 8524 (LPP)
Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS)
Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-École polytechnique universitaire de Lille (Polytech Lille)-Université de Lille, Sciences et Technologies
Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille
Vandewalle, Vincent
Source :: Computational Statistics and Data Analysis, Computational Statistics and Data Analysis, 2018, ⟨10.1016/j.csda.2018.06.013⟩, Computational Statistics and Data Analysis, Elsevier, 2018, ⟨10.1016/j.csda.2018.06.013⟩, COMPSTAT 2018-23rd International Conference on Computational Statistics, COMPSTAT 2018-23rd International Conference on Computational Statistics, Aug 2018, Iasi, Romania
Publication Year :: 2018
Abstract: International audience; In the framework of model-based clustering, a model allowing several latent class variables is proposed. This model assumes that the distribution of the observed data can be factorized into several independent blocks of variables. Each block is assumed to follow a latent class model ({\it i.e.,} mixture with conditional independence assumption). The proposed model includes variable selection, as a special case, and is able to cope with the mixed-data setting. The simplicity of the model allows to estimate the repartition of the variables into blocks and the mixture parameters simultaneously, thus avoiding to run EM algorithms for each possible repartition of variables into blocks. For the proposed method, a model is defined by the number of blocks, the number of clusters inside each block and the repartition of variables into block. Model selection can be done with two information criteria, the BIC and the MICL, for which an efficient optimization is proposed. The performances of the model are investigated on simulated and real data. It is shown that the proposed method gives a rich interpretation of the dataset at hand ({\it i.e.,} analysis of the repartition of the variables into blocks and analysis of the clusters produced by each block of variables).

Subjects :: Statistics and Probability
FOS: Computer and information sciences
Computer science
Information Criteria
Feature selection
02 engineering and technology
01 natural sciences
Methodology (stat.ME)
010104 statistics & probability
Model-based clustering
Block (programming)
0202 electrical engineering, electronic engineering, information engineering
0101 mathematics
Cluster analysis
Class variable
ComputingMilieux_MISCELLANEOUS
Statistics - Methodology
Model choice
Mixture model
[STAT.ME] Statistics [stat]/Methodology [stat.ME]
Applied Mathematics
Model selection
Latent class model
[STAT] Statistics [stat]
[STAT]Statistics [stat]
Computational Mathematics
Mixed-data
Computational Theory and Mathematics
Conditional independence
020201 artificial intelligence & image processing
Variables selection
Algorithm
[STAT.ME]Statistics [stat]/Methodology [stat.ME]

Details

Language :: English
ISSN :: 01679473
Database :: OpenAIRE
Journal :: Computational Statistics and Data Analysis, Computational Statistics and Data Analysis, 2018, ⟨10.1016/j.csda.2018.06.013⟩, Computational Statistics and Data Analysis, Elsevier, 2018, ⟨10.1016/j.csda.2018.06.013⟩, COMPSTAT 2018-23rd International Conference on Computational Statistics, COMPSTAT 2018-23rd International Conference on Computational Statistics, Aug 2018, Iasi, Romania
Accession number :: edsair.doi.dedup.....ebc2d189dfce914a35ccb5c7491e15e9

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A tractable multi-partitions clustering

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A tractable multi-partitions clustering

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources