51. Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time
- Author
-
Dhar, Amrit and Minin, Vladimir N
- Subjects
Applied Mathematics ,Biological Sciences ,Mathematical Sciences ,Algorithms ,Computational Biology ,Computer Simulation ,Evolution ,Molecular ,Markov Chains ,Models ,Genetic ,Phylogeny ,dynamic programming ,evolutionary conservation ,posterior predictive diagnostics ,stat.CO ,stat.ME ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Information and computing sciences ,Mathematical sciences - Abstract
Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.
- Published
- 2017