Back to Search
Start Over
Generalized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty
- Source :
- Systematic Biology, Systematic Biology, Oxford University Press (OUP), 2018, 67 (6), pp.997-1009. ⟨10.1093/sysbio/syx096⟩, Systematic Biology, 2018, 67 (6), pp.997-1009. ⟨10.1093/sysbio/syx096⟩
- Publication Year :
- 2016
-
Abstract
- International audience; Phylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce significantly different output when changing sequence input order. We used the HOMFAM protein sequences dataset to show that on datasets larger than 100 sequences, this instability affects on average 21.5% of the aligned residues. The resulting Maximum Likelihood (ML) trees estimated from these MSAs are equally unstable with over 38% of the branches being sensitive to the sequence input order. We established that about two-thirds of this uncertainty stems from the unordered nature of children nodes within the guide trees used to estimate MSAs. To quantify this uncertainty we developed unistrap, a novel approach that estimates the combined effect of alignment uncertainty and site sampling on phylogenetic tree branch supports. Compared with the regular bootstrap procedure, unistrap provides branch support estimates that take into account a larger fraction of the parameters impacting tree instability when processing datasets containing a large number of sequences.
- Subjects :
- 0106 biological sciences
0301 basic medicine
Sequence analysis
Sequence alignment
Biology
[SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics, Phylogenetics and taxonomy
010603 evolutionary biology
01 natural sciences
03 medical and health sciences
Genetics
Fraction (mathematics)
Ecology, Evolution, Behavior and Systematics
Phylogeny
Sequence
Multiple sequence alignment
Phylogenetic tree
Models, Genetic
[SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE]
Uncertainty
Sampling (statistics)
Proteins
Classification
Tree (data structure)
030104 developmental biology
Bootstrap analysis
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
Algorithm
Sequence Alignment
Software
Subjects
Details
- ISSN :
- 1076836X and 10635157
- Volume :
- 67
- Issue :
- 6
- Database :
- OpenAIRE
- Journal :
- Systematic biology
- Accession number :
- edsair.doi.dedup.....55c08448f453d12a6271bf9413b571d6