Back to Search Start Over

Posterior Simulation in Countable Mixture Models for Large Datasets.

Authors :
Guha, Subharup
Source :
Journal of the American Statistical Association. Jun2010, Vol. 105 Issue 490, p775-786. 12p.
Publication Year :
2010

Abstract

Mixture models, or convex combinations of a countable number of probability distributions, offer an elegant framework for inference when the population of interest can be subdivided into latent clusters having random characteristics that are heterogeneous between, but homogeneous within, the clusters. Traditionally, the different kinds of mixture models have been motivated and analyzed from very different perspectives, and their common characteristics have not been fully appreciated. The inferential techniques developed for these models usually necessitate heavy computational burdens that make them difficult, if not impossible, to apply to the massive data sets increasingly encountered in real world studies. This paper introduces a flexible class of models called generalized Pólya urn (GPU) processes. Many common mixture models, including finite mixtures, hidden Markov models, and Dirichlet processes, are obtained as special cases of GPU processes. Other important special cases include finite-dimensional Dirichlet priors, infinite hidden Markov models, analysis of densities models, nested Chinese restaurant processes, hierarchical DP models, nonparametric density models, spatial Dirichlet processes, weighted mixtures of DP priors, and nested Dirichlet processes. An investigation of the theoretical properties of GPU processes offers new insight into asymptotics that form the basis of cost-effective Markov chain Monte Carlo (MCMC) strategies for large datasets. These MCMC techniques have the advantage of providing inferences from the posterior of interest, rather than an approximation, and are applicable to different mixture models. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a semiparametric Bayesian analysis of high-resolution comparative genomic hybridization data on lung cancer. The appendixes are available online as supplemental material. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01621459
Volume :
105
Issue :
490
Database :
Academic Search Index
Journal :
Journal of the American Statistical Association
Publication Type :
Academic Journal
Accession number :
51980092
Full Text :
https://doi.org/10.1198/jasa.2010.tm09340