1. ModelRevelator: Fast phylogenetic model estimation via deep learning.
- Author
-
Burgstaller-Muehlbacher, Sebastian, Crotty, Stephen M., Schmidt, Heiko A., Reden, Franziska, Drucks, Tamara, and von Haeseler, Arndt
- Subjects
- *
DEEP learning , *ARTIFICIAL neural networks , *PHYLOGENETIC models , *MACHINE learning , *EVOLUTIONARY models - Abstract
[Display omitted] • Phylogenetic model selection can be performed using neural networks. • A resnet-18 neural network can be used to determine the model of sequence evolution. • A Bi-LSTM can be used to determine presence of rate heterogeneity and estimate the alpha parameter of the Γ --distribution. • Trees reconstructed using the models resulting from neural network estimates are closer to the ground truth than maximum likelihood trees. • Neural network computation times are constant, thus yielding strongly decreased runtimes for alignments with long sequences and many taxa. Selecting the best model of sequence evolution for a multiple-sequence-alignment (MSA) constitutes the first step of phylogenetic tree reconstruction. Common approaches for inferring nucleotide models typically apply maximum likelihood (ML) methods, with discrimination between models determined by one of several information criteria. This requires tree reconstruction and optimisation which can be computationally expensive. We demonstrate that neural networks can be used to perform model selection, without the need to reconstruct trees, optimise parameters, or calculate likelihoods. We introduce ModelRevelator, a model selection tool underpinned by two deep neural networks. The first neural network, NNmodelfind, recommends one of six commonly used models of sequence evolution, ranging in complexity from Jukes and Cantor to General Time Reversible. The second, NNalphafind, recommends whether or not a Γ -distributed rate heterogeneous model should be incorporated, and if so, provides an estimate of the shape parameter, ɑ. Users can simply input an MSA into ModelRevelator, and swiftly receive output recommending the evolutionary model, inclusive of the presence or absence of rate heterogeneity, and an estimate of ɑ. We show that ModelRevelator performs comparably with likelihood-based methods and the recently published machine learning method ModelTeller over a wide range of parameter settings, with significant potential savings in computational effort. Further, we show that this performance is not restricted to the alignments on which the networks were trained, but is maintained even on unseen empirical data. We expect that ModelRevelator will provide a valuable alternative for phylogeneticists, especially where traditional methods of model selection are computationally prohibitive. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF