1. Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity.
- Author
-
Menardo F, Loiseau C, Brites D, Coscolla M, Gygli SM, Rutaihwa LK, Trauner A, Beisel C, Borrell S, and Gagneux S
- Subjects
- Algorithms, Databases, Genetic, Humans, Information Storage and Retrieval, Computational Biology methods, Influenza A virus genetics, Mycobacterium tuberculosis genetics, Phylogeny, Software
- Abstract
Background: Large sequence datasets are difficult to visualize and handle. Additionally, they often do not represent a random subset of the natural diversity, but the result of uncoordinated and convenience sampling. Consequently, they can suffer from redundancy and sampling biases., Results: Here we present Treemmer, a simple tool to evaluate the redundancy of phylogenetic trees and reduce their complexity by eliminating leaves that contribute the least to the tree diversity., Conclusions: Treemmer can reduce the size of datasets with different phylogenetic structures and levels of redundancy while maintaining a sub-sample that is representative of the original diversity. Additionally, it is possible to fine-tune the behavior of Treemmer including any kind of meta-information, making Treemmer particularly useful for empirical studies.
- Published
- 2018
- Full Text
- View/download PDF