Back to Search
Start Over
STELLS2: fast and accurate coalescent-based maximum likelihood inference of species trees from gene tree topologies
- Source :
- Bioinformatics (Oxford, England). 33(12)
- Publication Year :
- 2016
-
Abstract
- Motivation It is well known that gene trees and species trees may have different topologies. One explanation is incomplete lineage sorting, which is commonly modeled by the coalescent process. In multispecies coalescent, a gene tree topology is observed with some probability (called the gene tree probability) for a given species tree. Gene tree probability is the main tool for the program STELLS, which finds the maximum likelihood estimate of the species tree from the given gene tree topologies. However, STELLS becomes slow when data size increases. Recently, several fast species tree inference methods have been developed, which can handle large data. However, these methods often do not fully utilize the information in the gene trees. Results In this paper, we present an algorithm (called STELLS2) for computing the gene tree probability more efficiently than the original STELLS. The key idea of STELLS2 is taking some ‘shortcuts’ during the computation and computing the gene tree probability approximately. We apply the STELLS2 algorithm in the species tree inference approach in the original STELLS, which leads to a new maximum likelihood species tree inference method (also called STELLS2). Through simulation we demonstrate that the gene tree probabilities computed by STELLS2 and STELLS have strong correlation. We show that STELLS2 is almost as accurate in species tree inference as STELLS. Also STELLS2 is usually more accurate than several existing methods when there is one allele per species, although STELLS2 is slower than these methods. STELLS2 outperforms these methods significantly when there are multiple alleles per species. Availability and Implementation The program STELLS2 is available for download at: https://github.com/yufengwudcs/STELLS2 Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- 0106 biological sciences
0301 basic medicine
Statistics and Probability
Computer science
Computation
Inference
Network topology
010603 evolutionary biology
01 natural sciences
Biochemistry
Coalescent theory
03 medical and health sciences
Computer Simulation
Molecular Biology
Alleles
Phylogeny
Likelihood Functions
business.industry
Gene tree
Computational Biology
Pattern recognition
Sequence Analysis, DNA
Computer Science Applications
Computational Mathematics
Tree (data structure)
030104 developmental biology
Genetics, Population
Computational Theory and Mathematics
Tree rearrangement
Key (cryptography)
Artificial intelligence
business
Algorithm
Algorithms
Software
Subjects
Details
- ISSN :
- 13674811
- Volume :
- 33
- Issue :
- 12
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....ae842eb11cc49abafa9328b2fb83acd1