Distributed personalized imputation based on Gaussian mixture model for missing data.

Authors :: Chen, Sicong
Liu, Ying
Source :: Neural Computing & Applications. Aug2024, Vol. 36 Issue 23, p14237-14250. 14p.
Publication Year :: 2024
Abstract: Distributed machine learning has received much attention for more than two decades. Yet, it is still a challenge to achieve acceptable performance in practical scenarios when some features of data samples are missing. Although some imputation methods have been proposed for handling missing data, their performance deteriorates significantly when data are heterogeneously distributed over different nodes in the network. Considering this, in this article, we first propose a general Gaussian mixture model (GMM) consisting of both public and personalized components for modeling the homogeneous and the heterogeneous parts of data distribution, respectively. Then, we develop a distributed personalized expectation–maximization method based on knowledge transfer (KT-dpEM) to estimate the parameters of the proposed general GMM. After that, based on the estimated general GMM, missing data are imputed using the posterior conditional mean. Experimental results show that the proposed KT-dpEM algorithm has better imputation accuracy, higher robustness against different missing probabilities and better classification performance in the downstream classification tasks, compared with state-of-the-art algorithms. [ABSTRACT FROM AUTHOR]

Subjects :: *GAUSSIAN mixture models
*DATA distribution
*KNOWLEDGE transfer
*MACHINE learning
*MISSING data (Statistics)
*PROBABILITY theory
*MULTIPLE imputation (Statistics)

Full Text Access

Tools