Back to Search
Start Over
Large multiple sequence alignments with a root-to-leaf regressive method
- Source :
- Nature Biotechnology, Nature biotechnology
- Publication Year :
- 2019
-
Abstract
- Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf to root, based on a guide tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around from the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6.
- Subjects :
- Root (linguistics)
Workstation
Computer science
Biomedical Engineering
Bioengineering
Applied Microbiology and Biotechnology
Article
law.invention
03 medical and health sciences
0302 clinical medicine
law
Databases, Genetic
Time complexity
030304 developmental biology
0303 health sciences
Sequence
Eukaryota
Genomics
Tree (data structure)
Regression Analysis
Molecular Medicine
Sequence Alignment
Algorithm
Algorithms
030217 neurology & neurosurgery
Biotechnology
Subjects
Details
- ISSN :
- 10870156
- Database :
- OpenAIRE
- Journal :
- Nature Biotechnology
- Accession number :
- edsair.doi.dedup.....61a91533710af045aef21280db06bc83
- Full Text :
- https://doi.org/10.1038/s41587-019-0333-6