Back to Search Start Over

Generalized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty

Authors :
Olivier Gascuel
Maria Chatzou
Paolo Di Tommaso
Cedric Notredame
Evan Floden
Centre for Genomic Regulation [Barcelona] (CRG)
Universitat Pompeu Fabra [Barcelona] (UPF)-Centro Nacional de Analisis Genomico [Barcelona] (CNAG)
Méthodes et Algorithmes pour la Bioinformatique (MAB)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Bioinformatique évolutive - Evolutionary Bioinformatics
Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS)
We acknowledge support of the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013–2017. We acknowledge the support of the CERCA Programme/Generalitat de Catalunya
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
Source :
Systematic Biology, Systematic Biology, Oxford University Press (OUP), 2018, 67 (6), pp.997-1009. ⟨10.1093/sysbio/syx096⟩, Systematic Biology, 2018, 67 (6), pp.997-1009. ⟨10.1093/sysbio/syx096⟩
Publication Year :
2016

Abstract

International audience; Phylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce significantly different output when changing sequence input order. We used the HOMFAM protein sequences dataset to show that on datasets larger than 100 sequences, this instability affects on average 21.5% of the aligned residues. The resulting Maximum Likelihood (ML) trees estimated from these MSAs are equally unstable with over 38% of the branches being sensitive to the sequence input order. We established that about two-thirds of this uncertainty stems from the unordered nature of children nodes within the guide trees used to estimate MSAs. To quantify this uncertainty we developed unistrap, a novel approach that estimates the combined effect of alignment uncertainty and site sampling on phylogenetic tree branch supports. Compared with the regular bootstrap procedure, unistrap provides branch support estimates that take into account a larger fraction of the parameters impacting tree instability when processing datasets containing a large number of sequences.

Details

ISSN :
1076836X and 10635157
Volume :
67
Issue :
6
Database :
OpenAIRE
Journal :
Systematic biology
Accession number :
edsair.doi.dedup.....55c08448f453d12a6271bf9413b571d6