Back to Search Start Over

A statistical model for describing and simulating microbial community profiles.

Authors :
Ma, Siyuan
Ren, Boyu
Mallick, Himel
Moon, Yo Sup
Schwager, Emma
Maharjan, Sagun
Tickle, Timothy L.
Lu, Yiren
Carmody, Rachel N.
Franzosa, Eric A.
Janson, Lucas
Huttenhower, Curtis
Source :
PLoS Computational Biology; 9/13/2021, Vol. 17 Issue 9, p1-27, 27p, 4 Graphs
Publication Year :
2021

Abstract

Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA's model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. "taxa") or between features and "phenotypes" to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA's performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA's utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2. Author summary: Researchers have linked the human microbiome (collection of microbes residing on or within human tissues) with a wide range of health and disease conditions, establishing these microorganisms as a vital component of our well-being. However, studies on the microbiome require careful technical considerations, as invalid approaches can (and have been reported to) cause under-detections or false discoveries. To this end, a mathematical model can be used both to describe microbiomes, and to simulate how computational tools might behave for them. For example, researchers could test analysis approaches on simulated microbiomes, thus informing designs and method deployment for real-world studies. This has not previously been possible, due to multiple technical challenges with microbiome data; these are a) often constrained to sum up to a constant ("compositional"), b) enriched for zero measurements (zero-inflated), and c) composed of many, potentially interacting microbes (high-dimensional). We present a statistical model aimed at describing and simulating microbiome datasets, with components targeting all these issues, as well as the accompanying computational algorithm and implementation. The freely available implementation, named SparseDOSSA, is validated through extensive simulation and real-world examinations, and will hopefully be of broad theoretical and practical use to researchers in microbiome epidemiology and microbial ecology. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1553734X
Volume :
17
Issue :
9
Database :
Complementary Index
Journal :
PLoS Computational Biology
Publication Type :
Academic Journal
Accession number :
152418988
Full Text :
https://doi.org/10.1371/journal.pcbi.1008913