1. Phylo2Vec: a vector representation for binary trees
- Author
-
Penn, Matthew J, Scheidwasser, Neil, Khurana, Mark P, Duchêne, David A, Donnelly, Christl A, and Bhatt, Samir
- Subjects
Quantitative Biology - Populations and Evolution ,Computer Science - Machine Learning ,Quantitative Biology - Quantitative Methods - Abstract
Binary phylogenetic trees inferred from biological data are central to understanding the shared history among evolutionary units. However, inferring the placement of latent nodes in a tree is computationally expensive. State-of-the-art methods rely on carefully designed heuristics for tree search, using different data structures for easy manipulation (e.g., classes in object-oriented programming languages) and readable representation of trees (e.g., Newick-format strings). Here, we present Phylo2Vec, a parsimonious encoding for phylogenetic trees that serves as a unified approach for both manipulating and representing phylogenetic trees. Phylo2Vec maps any binary tree with $n$ leaves to a unique integer vector of length $n-1$. The advantages of Phylo2Vec are fourfold: i) fast tree sampling, (ii) compressed tree representation compared to a Newick string, iii) quick and unambiguous verification if two binary trees are identical topologically, and iv) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill-climbing-based optimisation scheme can efficiently traverse the vastness of tree space from a random to an optimal tree., Comment: 38 pages, 9 figures, 1 table, 2 supplementary figures
- Published
- 2023
- Full Text
- View/download PDF