Back to Search Start Over

Mixture of Gaussians for distance estimation with missing data

Authors :
Emil Eirola
Christophe Biernacki
Vincent Vandewalle
Amaury Lendasse
School of Electrical Engineering [Aalto]
Aalto University
Laboratory of Computer and Information Science ( CIS )
Helsinki University of Technology ( TKK )
MOdel for Data Analysis and Learning ( MODAL )
Inria Lille - Nord Europe
Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire Paul Painlevé - UMR 8524 ( LPP )
Université de Lille-Centre National de la Recherche Scientifique ( CNRS ) -Université de Lille-Centre National de la Recherche Scientifique ( CNRS ) -Santé publique : épidémiologie et qualité des soins-EA 2694 ( CERIM )
Université de Lille-Centre Hospitalier Régional Universitaire [Lille] ( CHRU Lille ) -Université de Lille-Centre Hospitalier Régional Universitaire [Lille] ( CHRU Lille ) -Polytech Lille-Université de Lille 1, IUT’A
Université de Lille, Droit et Santé
Laboratoire Paul Painlevé - UMR 8524 ( LPP )
Université de Lille-Centre National de la Recherche Scientifique ( CNRS )
CHU Lille
Université de Lille
School of Electrical Engineering [Aalto Univ]
Laboratory of Computer and Information Science (CIS)
TKK Helsinki University of Technology (TKK)
MOdel for Data Analysis and Learning (MODAL)
Laboratoire Paul Painlevé (LPP)
Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS)
Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille)
Laboratoire Paul Painlevé - UMR 8524 (LPP)
Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe
Source :
Neurocomputing, Neurocomputing, Elsevier, 2014, 131, pp.32-42, Neurocomputing, 2014, 131, pp.32-42. ⟨10.1016/j.neucom.2013.07.050⟩, Neurocomputing, Elsevier, 2014, 131, pp.32-42. ⟨10.1016/j.neucom.2013.07.050⟩
Publication Year :
2014
Publisher :
Elsevier BV, 2014.

Abstract

Many data sets have missing values in practical application contexts, but the majority of commonly studied machine learning methods cannot be applied directly when there are incomplete samples. However, most such methods only depend on the relative differences between samples instead of their particular values, and thus one useful approach is to directly estimate the pairwise distances between all samples in the data set. This is accomplished by fitting a Gaussian mixture model to the data, and using it to derive estimates for the distances. A variant of the model for high-dimensional data with missing values is also studied. Experimental simulations confirm that the proposed method provides accurate estimates compared to alternative methods for estimating distances. In particular, using the mixture model for estimating distances is on average more accurate than using the same model to impute any missing values and then calculating distances. The experimental evaluation additionally shows that more accurately estimating distances lead to improved prediction performance for classification and regression tasks when used as inputs for a neural network.

Details

ISSN :
09252312
Volume :
131
Database :
OpenAIRE
Journal :
Neurocomputing
Accession number :
edsair.doi.dedup.....ef4a4d2d1f30776b294df3deeb82b42e