733 results on '"Tree rearrangement"'
Search Results
2. Extremal Distances for Subtree Transfer Operations in Binary Trees.
- Author
-
Atkins, Ross and McDiarmid, Colin
- Subjects
- *
TREE graphs , *EXTREMAL combinatorics - Abstract
Three standard subtree transfer operations for binary trees, used in particular for phylogenetic trees, are: tree bisection and reconnection (TBR), subtree prune and regraft (SPR), and rooted subtree prune and regraft (rSPR). We show that for a pair of leaf-labelled binary trees with n leaves, the maximum number of such moves required to transform one into the other is n-Θ(n), extending a result of Ding, Grünewald, and Humphries, and this holds also if one of the trees is fixed arbitrarily. If the pair is chosen uniformly at random, then the expected number of moves required is n-Θ(n2/3). These results may be phrased in terms of agreement forests: we also give extensions for more than two binary trees. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
3. Tree thinking, time and topology: comments on the interpretation of tree diagrams in evolutionary/phylogenetic systematics
- Author
-
János Podani
- Subjects
Systematics ,Taxon ,Phylogenetic tree ,Cladogram ,Ecology ,Evolutionary biology ,Tree rearrangement ,Path (graph theory) ,Biology ,Tree (graph theory) ,Ecology, Evolution, Behavior and Systematics ,Anagenesis - Abstract
This paper presents a graph theoretical overview of tree diagrams applied extensively in systematic biology. Simple evolutionary models involving three speciation processes (splitting, budding and anagenesis) are used for evaluating the ability of different rooted trees to demonstrate temporal and ancestor-descendant relationships within or among species. On this basis, they are classified into four types: (i) diachronous trees depict evolutionary history faithfully because the order of nodes along any path agrees with the temporal sequence of respective populations or species, (ii) achronous trees show ancestor-descendant relationships for species or higher taxa such that the time aspect is disregarded, (iii) synchronous trees attempt to reveal evolutionary pathways and/or distributional patterns of apomorphic characters for organisms living at the same point of time, and (iv) asynchronous trees may do the same regardless the time of origin (e.g. when extant and extinct species are evaluated together). Trees of the last two types are cladograms, the synchronous ones emphasizing predominantly-but not exclusively-the evolutionary process within a group, while asynchronous cladograms are usually focused on pattern and infrequently on process. Historical comments and the examples demonstrate that each of these tree types is useful on its own right in evolutionary biology and systematics. In practice, separation among them is not sharp, and their features are often combined into eclectic tree forms whose interpretation is not entirely free from problems.
- Published
- 2021
4. A Memetic Algorithm Based on an NSGA-II Scheme for Phylogenetic Tree Inference
- Author
-
Márcio Dorn, Manuel Villalobos-Cid, Mario Inostroza-Ponta, and Rodrigo Ligabue-Braun
- Subjects
Mathematical optimization ,Optimization problem ,Phylogenetic tree ,Computer science ,business.industry ,Inference ,02 engineering and technology ,Multi-objective optimization ,Theoretical Computer Science ,Tree (data structure) ,Computational Theory and Mathematics ,Tree rearrangement ,0202 electrical engineering, electronic engineering, information engineering ,Memetic algorithm ,020201 artificial intelligence & image processing ,Local search (optimization) ,business ,Software - Abstract
Phylogenetic inference allows building a hypothesis about the evolutionary relationships between a group of species, which is usually represented as a tree. The phylogenetic inference problem can be seen as an optimization problem, searching for the most qualified tree among all the possible topologies according to a selected criterion. These criteria can be based on different principles. Due to the combinatorial number of possible topologies, diverse heuristics and meta-heuristics have been proposed to find approximated solutions according to one criterion. However, these methods may result in several phylogeny trees which could be in conflict with one another. In order to deal with this problem, models based on multiobjective optimization with different configurations have been used. In this paper, we propose an ad-hoc multiobjective memetic algorithm (MO-MA) to infer phylogeny using two objectives: 1) maximum parsimony and 2) likelihood. Several population operators and local search strategies are proposed and evaluated in order to measure their contribution to the algorithm. Additionally, we perform a comparison among different configurations and tree rearrangement strategies. The results show that the proposed MO-MA is able to identify a Pareto set of solutions that include new trees which were nondominated by solutions from the current state of the art single-objective optimization tools. Furthermore, the MO-MA improves the results presented in the literature for multiobjective approaches in all of the studied data sets. These results make our proposal a good alternative for phylogenetic inference.
- Published
- 2019
- Full Text
- View/download PDF
5. Exact Algorithms for Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees
- Author
-
Misagh Kordi and Mukul S. Bansal
- Subjects
0301 basic medicine ,Theoretical computer science ,Gene Transfer, Horizontal ,Computational complexity theory ,Computer science ,0206 medical engineering ,Binary number ,02 engineering and technology ,Biology ,Network topology ,Evolutionary computation ,Evolution, Molecular ,Set (abstract data type) ,03 medical and health sciences ,Transfer (computing) ,Gene Duplication ,Gene duplication ,Genetic algorithm ,Genetics ,Gene ,Phylogeny ,030102 biochemistry & molecular biology ,Bacteria ,Models, Genetic ,Applied Mathematics ,Uncertainty ,Computational Biology ,Genomics ,030104 developmental biology ,ComputingMethodologies_PATTERNRECOGNITION ,Genes, Bacterial ,Tree rearrangement ,Multigene Family ,Horizontal gene transfer ,Algorithm design ,ComputingMethodologies_GENERAL ,Algorithm ,Algorithms ,Software ,020602 bioinformatics ,Biotechnology - Abstract
Duplication-Transfer-Loss (DTL) reconciliation is a powerful method for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation seeks to reconcile gene trees with species trees by postulating speciation, duplication, transfer, and loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. In practice, however, gene trees are often non-binary due to uncertainty in the gene tree topologies, and DTL reconciliation with non-binary gene trees is known to be NP-hard. In this paper, we present the first, exact algorithms for DTL reconciliation with non-binary gene trees. Specifically, we (i) show that the DTL reconciliation problem for non-binary gene trees is fixed-parameter tractable in the maximum degree of the gene tree, (ii) present an exponential-time, but in-practice efficient, algorithm to track and enumerate all optimal binary resolutions of an unresolved input gene tree, and (iii) apply our algorithms to a large empirical dataset of over 4700 gene trees from 100 species to study the impact of gene tree uncertainty on DTL-reconciliation and to demonstrate the applicability and utility of our algorithms. The new techniques and algorithms introduced in this paper make it possible, for the first time, to systematically calculate and negate the impact of gene tree uncertainty on reconciliation accuracy, and will help biologists avoid incorrect evolutionary inferences caused by gene tree uncertainty.
- Published
- 2019
- Full Text
- View/download PDF
6. Evolutionary Algorithms in Decision Tree Induction
- Author
-
Raffaele Miele, Francesco Mola, and Claudio Conversano
- Subjects
Incremental decision tree ,Decision support system ,Theoretical computer science ,Computer science ,Tree rearrangement ,Ant colony optimization algorithms ,Evolutionary algorithm ,Decision tree ,ID3 algorithm ,Combinatorial optimization - Abstract
One of the biggest problem that many data analysis techniques have to deal with nowadays is Combinatorial Optimization that, in the past, has led many methods to be taken apart. Actually, the (still not enough!) higher computing power available makes it possible to apply such techniques within certain bounds. Since other research fields like Artificial Intelligence have been (and still are) dealing with such problems, their contribute to statistics has been very significant. This chapter tries to cast the Combinatorial Optimization methods into the Artificial Intelligence framework, particularly with respect Decision Tree Induction, which is considered a powerful instrument for the knowledge extraction and the decision making support. When the exhaustive enumeration and evaluation of all the possible candidate solution to a Tree-based Induction problem is not computationally affordable, the use of Nature Inspired Optimization Algorithms, which have been proven to be powerful instruments for attacking many combinatorial optimization problems, can be of great help. In this respect, the attention is focused on three main problems involving Decision Tree Induction by mainly focusing the attention on the Classification and Regression Tree-CART (Breiman et al., 1984) algorithm. First, the problem of splitting complex predictors such a multi-attribute ones is faced through the use of Genetic Algorithms. In addition, the possibility of growing “optimal” exploratory trees is also investigated by making use of Ant Colony Optimization (ACO) algorithm. Finally, the derivation of a subset of decision trees for modelling multi-attribute response on the basis of a data-driven heuristic is also described. The proposed approaches might be useful for knowledge extraction from large databases as well as for data mining applications. The solution they offer for complicated data modelling and data analysis problems might be considered for a possible implementation in a Decision Support System (DSS). The remainder of the chapter is as follows. Section 2 describes the main features and the recent developments of Decision Tree Induction. An overview of Combinatorial Optimization with a particular focus on Genetic Algorithms and Ant Colony Optimization is presented in section 3. The use of these two algorithms within the Decision Tree Induction Framework is described in section 4, together with the description of the algorithm for modelling multi-attribute response. Section 5 summarizes the results of the proposed O pe n A cc es s D at ab as e w w w .ite ch on lin e. co m
- Published
- 2021
7. A Multi-Criterion Evolutionary Approach Applied to Phylogenetic Reconstruction
- Author
-
W. Cancino and A.C.B. Delbem
- Subjects
Tree (data structure) ,Phylogenetic tree ,Optimality criterion ,Tree rearrangement ,Evolutionary algorithm ,Context (language use) ,Algorithm ,Neighbor joining ,Mathematics ,Maximum parsimony - Abstract
In this paper, we proposed an MOEA approach, called PhyloMOEA which solves the phylogenetic inference problem using maximum parsimony and maximum likelihood criteria. The PhyloMOEA's development was motivated by several studies in the literature (Huelsenbeck, 1995; Jin & Nei, 1990; Kuhner & Felsenstein, 1994; Tateno et al., 1994), which point out that various phylogenetic inference methods lead to inconsistent solutions. Techniques using parsimony and likelihood criteria yield to different trees when they are applied separately to the four nucleotide datasets used in the experiments. On the other hand, PhyloMOEA was applied to the four datasets and found a set of trees that represents a trade-off between these criteria. POS and FS trees obtained by PhyloMOEA were statistically evaluated using the SH-test. The results of this test suggest that several PhyloMOEA solutions are consistent with the criteria used. It is important to observe that the PhyloMOEA trees are not directly comparable with trees obtained by other phylogenetic reconstruction programs since these programs consider only one optimality criterion. Moreover, support values for clades included in trees obtained by PhyloMOEA were calculated. The clades were classified into several types according to the type of trees the clade is in: maximum parsimony, maximum likelihood or intermediate trees. Support values were compared with clade posterior probabilities reported by Mr.Bayes for the four test datasets used. The results show that PhyloMOEA clade support closely approximates Mr.Bayes posterior probabilities if the clades found in the set of trees correspond to intermediate and maximum likelihood/maximum parsimony trees. Despite the relevant results found by PhyloMOEA, there are aspects that could be addressed in order to improve the algorithm and corresponding results: · PhyloMOEA requires several hours to find acceptable Pareto-solutions if initial trees are poorly estimated. This problem can be improved taking into account local search strategies (Guindon & Gascuel, 2003; Stamatakis & Meier, 2004). PhyloMOEA's performance is also decreased by the likelihood calculation, which is computationally intensive. As mentioned in Section 5.3, there are other techniques that address this problem (Larget & Simon, 1998; Stamatakis & Meier, 2004); · The proposed algorithm does not optimize parameters of the evolution model employed in the likelihood calculation. These values can be included in each solution such that they can be optimized during the algorithm execution (Lewis, 1998); · PhyloMOEA uses only Fitch parsimony which has a unitary state change cost matrix. The use of more complex parsimony models or even generalized parsimony can improve the results (Swofford et al., 1996); · Clade support obtained from PhyloMOEA trees can be also compared with bootstrap support values. A bootstrap analysis, using parsimony and likelihood criteria separately, enables the separation of clades that best support the maximum parsimony and maximum likelihood trees. This could lead to a better comparison between PhyloMOEA and bootstrap clade support values; · This research has not investigated the metrics for convergence and diversity of the obtained Pareto front. Measurements for convergence are difficult to obtain since the Pareto front is unknown in this case. On the other hand, various diversity metrics found in the literature (Deb, 2001) can be investigated; The experiments have shown that PhyloMOEA can make relevant contributions to phylogenetic inference. Moreover, there are remaining aspects that can be investigated to improve the current approach.
- Published
- 2021
8. Computing nearest neighbour interchange distances between ranked phylogenetic trees
- Author
-
Lena Collienne and Alex Gavryushkin
- Subjects
0106 biological sciences ,FOS: Computer and information sciences ,68W40, 68Q25, 03D15 ,Computational complexity theory ,Computer science ,Inference ,Computational Complexity (cs.CC) ,92B05 ,010603 evolutionary biology ,01 natural sciences ,Article ,Combinatorics ,03 medical and health sciences ,Quadratic equation ,Computer Science - Data Structures and Algorithms ,Rank (graph theory) ,Cluster Analysis ,Data Structures and Algorithms (cs.DS) ,Phylogeny ,030304 developmental biology ,0303 health sciences ,Phylogenetic tree ,Models, Genetic ,Applied Mathematics ,68Q25 ,Computational Biology ,Agricultural and Biological Sciences (miscellaneous) ,Tree (data structure) ,Computer Science - Computational Complexity ,Modeling and Simulation ,Tree rearrangement ,Shortest path problem ,F.2.0 ,Algorithms - Abstract
Many popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although ranked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).
- Published
- 2020
9. On the Neighborhoods of Trees.
- Author
-
Humphries, Peter J. and Wu, Taoyang
- Abstract
Tree rearrangement operations typically induce a metric on the space of phylogenetic trees. One important property of these metrics is the size of the neighborhood, that is, the number of trees exactly one operation from a given tree. We present an exact expression for the size of the TBR (tree bisection and reconnection) neighborhood, thus answering a question first posed by Allen and Steel . In addition, we also obtain a characterization of the extremal trees whose TBR neighborhoods are maximized and minimized. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
10. Combinatorics of distance-based tree inference.
- Author
-
Pardi, Fabio and Gascuel, Olivier
- Subjects
- *
PHYLOGENY , *COMBINATORICS , *TOPOLOGY , *LEAST squares , *HEURISTIC algorithms - Abstract
Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
11. On agreement forests
- Author
-
Ding, Yang, Grünewald, Stefan, and Humphries, Peter J.
- Subjects
- *
TREE graphs , *SET theory , *BIOINFORMATICS , *MATHEMATICAL analysis , *PHYLOGENY , *GRAPH theory - Abstract
Abstract: Tree rearrangement operations are widely used to measure the dissimilarity between phylogenetic trees with identical leaf sets. The tree bisection and reconnection (tbr) distance for unrooted trees can be equivalently defined in terms of agreement forests. For both the tbr distance and the less general subtree prune and regraft (spr) distance, we use such forests to derive new upper and lower bounds on the maximal possible distance between two trees with n leaves. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
12. Optimal depth-first algorithms and equilibria of independent distributions on multi-branching trees
- Author
-
Keng Meng Ng, NingNing Peng, Yue Yang, Kazuyuki Tanaka, and Weiguang Peng
- Subjects
Discrete mathematics ,Binary tree ,Computational complexity theory ,010102 general mathematics ,Weight-balanced tree ,0102 computer and information sciences ,01 natural sciences ,Computer Science Applications ,Theoretical Computer Science ,Combinatorics ,Branching (linguistics) ,010201 computation theory & mathematics ,Tree rearrangement ,Signal Processing ,Probabilistic analysis of algorithms ,Depth-first search ,0101 mathematics ,Algorithm ,Information Systems ,Mathematics ,Analysis of algorithms - Abstract
The main purpose of this paper is to answer two questions about the distributional complexity of multi-branching trees. We first show that for any independent distribution d on assignments for a multi-branching tree, a certain directional algorithm DIR d is optimal among all the depth-first algorithms (including non-directional ones) with respect to d. We next generalize Suzuki–Niida's result on binary trees to the case of multi-branching trees. By means of this result and our optimal algorithm, we show that for any balanced multi-branching AND–OR tree, the optimal distributional complexity among all the independent distributions (ID) is (under an assumption that the probability of the root having value 0 is neither 0 nor 1) actually achieved by an independent and identical distribution (IID).
- Published
- 2017
- Full Text
- View/download PDF
13. Reconciliation With Nonbinary Gene Trees Revisited
- Author
-
Yu Zheng and Louxin Zhang
- Subjects
0301 basic medicine ,Red–black tree ,Speedup ,Theoretical computer science ,K-ary tree ,Phylogenetic tree ,0206 medical engineering ,02 engineering and technology ,Coalescent theory ,03 medical and health sciences ,Tree (data structure) ,030104 developmental biology ,Artificial Intelligence ,Hardware and Architecture ,Control and Systems Engineering ,Tree rearrangement ,Algorithm ,020602 bioinformatics ,Software ,Order statistic tree ,Information Systems ,Mathematics - Abstract
By reconciling the phylogenetic tree of a gene family with the corresponding species tree, it is possible to infer lineage-specific duplications and losses with high confidence and hence to annotate orthologs and paralogs. The currently available reconciliation methods for nonbinary gene trees are computationally expensive for genome-scale applications. We present four O (| G |+| S |) algorithms to reconcile an arbitrary gene tree G with a binary species tree S in the duplication, loss, duploss (also known as mutation), and deep coalescence cost models, where |· | denotes the number of nodes in a tree. The improvement is achieved through two innovations: a linear-time computation of compressed child-image subtrees and efficient reconstruction of irreducible duplication histories. Our technique for child-image subtree compression also results in an order of magnitude speedup in runtime for the dynamic programming and Wagner parsimony--based methods for tree reconciliation in the affine cost model.
- Published
- 2017
- Full Text
- View/download PDF
14. Detecting hybridization by likelihood calculation of gene tree extra lineages given explicit models
- Author
-
Luciano Javier Avila, Mariana Morando, Melisa Olave, and Jack W. Sites
- Subjects
0106 biological sciences ,0301 basic medicine ,Otras Ciencias Biológicas ,Biology ,MODEL-BASED ANALYSIS ,010603 evolutionary biology ,01 natural sciences ,Gene flow ,Coalescent theory ,LIKELIHOOD ,Ciencias Biológicas ,03 medical and health sciences ,Gene duplication ,GENE FLOW ,DEEP COALESCENCE ,Gene ,Ecology, Evolution, Behavior and Systematics ,Genetics ,Ecological Modeling ,030104 developmental biology ,Evolutionary biology ,Tree rearrangement ,Likelihood-ratio test ,Horizontal gene transfer ,Likelihood function ,HYBRIDIZATION ,CIENCIAS NATURALES Y EXACTAS - Abstract
Explanations for gene tree discordance with respect to a species tree are commonly attributed to deep coalescence (also known as incomplete lineage sorting [ILS]), as well as different evolutionary processes such as hybridization, horizontal gene transfer and gene duplication. Among these, deep coalescence is usually quantified as the number of extra linages and has been studied as the principal source of discordance among gene trees, while the other processes that could contribute to gene tree discordance have not been fully explored. This is an important issue for hybridization because interspecific gene flow is well documented and widespread across many plant and animal groups. Here, we propose a new way to detect gene flow when ILS is present that evaluates the likelihood of different models with various levels of gene flow, by comparing the expected gene tree discordance, using the number of extra lineages. This approach consists of proposing a model, simulating a set of gene trees to infer a distribution of expected extra lineages given the model, and calculating a likelihood function by comparing the fit of the real gene trees to the simulated distribution. To count extra lineages, the gene tree is first reconciled within the species tree, and for a given species tree branch the number of gene lineages minus one is counted. We develop a set of r functions to parallelize software to allow simulations, and to compare hypotheses via a likelihood ratio test to evaluate the presence of gene flow when ILS is present, in a fast and simple way. Our results show high accuracy under very challenging scenarios of high impact of ILS and low gene flow levels, even using a modest dataset of 5–10 loci and 5–10 individuals per species. We present a powerful and fast method to detect hybridization in the presence of ILS. We discuss its advantage with large dataset (such as genomic scale), and also identifies possible issues that should be explored with more complex models in future studies. Fil: Olave, Melisa. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Instituto Patagónico para el Estudio de los Ecosistemas Continentales; Argentina. University of Konstanz; Alemania Fil: Avila, Luciano Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Instituto Patagónico para el Estudio de los Ecosistemas Continentales; Argentina Fil: Sites, Jack W.. Brigham Young University; Estados Unidos Fil: Morando, Mariana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Instituto Patagónico para el Estudio de los Ecosistemas Continentales; Argentina
- Published
- 2017
- Full Text
- View/download PDF
15. Live phylogeny with polytomies: Finding the most compact parsimonious trees
- Author
-
Georgios Papamichail, Dimitris Papamichail, Angela Huang, Edward Kennedy, Jan-Lucas Ott, and Andrew Richard Miller
- Subjects
0301 basic medicine ,Binary tree ,Theoretical computer science ,Phylogenetic tree ,0206 medical engineering ,Organic Chemistry ,Inference ,Zoology ,02 engineering and technology ,Phylogenetic network ,Biology ,Biochemistry ,Maximum parsimony ,03 medical and health sciences ,Computational Mathematics ,Tree (data structure) ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Structural Biology ,Phylogenetics ,Tree rearrangement ,Algorithms ,Phylogeny ,020602 bioinformatics - Abstract
Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies.
- Published
- 2017
- Full Text
- View/download PDF
16. Isometric gene tree reconciliation revisited
- Author
-
Askar Gafurov, Broňa Brejová, Michal Sabo, Dana Pardubská, and Tomáš Vinař
- Subjects
0301 basic medicine ,Theoretical computer science ,Gene family evolution ,lcsh:QH426-470 ,0206 medical engineering ,Gene tree reconciliation ,Context (language use) ,02 engineering and technology ,Biology ,Combinatorics ,03 medical and health sciences ,Structural Biology ,Molecular Biology ,lcsh:QH301-705.5 ,Phylogenetic tree ,Research ,Applied Mathematics ,Gene tree ,Binary logarithm ,Level ancestor ,Running time ,Tree (data structure) ,lcsh:Genetics ,030104 developmental biology ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Tree rearrangement ,020602 bioinformatics - Abstract
Background Isometric gene tree reconciliation is a gene tree/species tree reconciliation problem where both the gene tree and the species tree include branch lengths, and these branch lengths must be respected by the reconciliation. The problem was introduced by Ma et al. in 2008 in the context of reconstructing evolutionary histories of genomes in the infinite sites model. Results In this paper, we show that the original algorithm by Ma et al. is incorrect, and we propose a modified algorithm that addresses the problems that we discovered. We have also improved the running time from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(N^2)$$\end{document}O(N2) to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(N\log N)$$\end{document}O(NlogN), where N is the total number of nodes in the two input trees. Finally, we examine two new variants of the problem: reconciliation of two unrooted trees and scaling of branch lengths of the gene tree during reconciliation of two rooted trees. Conclusions We provide several new algorithms for isometric reconciliation of trees. Some questions in this area remain open; most importantly extensions of the problem allowing for imprecise estimates of branch lengths.
- Published
- 2017
- Full Text
- View/download PDF
17. Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction
- Author
-
Seth Sullivant, John A. Rhodes, and Elizabeth S. Allman
- Subjects
0301 basic medicine ,0206 medical engineering ,02 engineering and technology ,Evolution, Molecular ,Combinatorics ,03 medical and health sciences ,Tree (descriptive set theory) ,Computational phylogenetics ,Genetics ,Computer Simulation ,Molecular Biology ,Phylogeny ,Alignment-free sequence analysis ,Mathematics ,Multiple sequence alignment ,Base Sequence ,Models, Genetic ,Phylogenetic tree ,Sequence Analysis, DNA ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Modeling and Simulation ,Tree rearrangement ,Metric (mathematics) ,Identifiability ,Sequence Alignment ,Algorithm ,Algorithms ,020602 bioinformatics - Abstract
Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing that the corrected distance outperforms many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well since k-mer methods are usually the first step in constructing a guide tree for such algorithms.
- Published
- 2017
- Full Text
- View/download PDF
18. Node-depth phylogenetic-based encoding, a spanning-tree representation for evolutionary algorithms. part I: Proposal and properties analysis
- Author
-
Anderson da Silva Soares, Alexandre C. B. Delbem, João Bosco Augusto London Junior, Fernando Marques Federson, Jeffrey Van Baalen, and Telma Woerle de Lima
- Subjects
Spanning tree ,General Computer Science ,Phylogenetic tree ,Computer science ,General Mathematics ,Locality ,Evolutionary algorithm ,ROBÓTICA ,0102 computer and information sciences ,02 engineering and technology ,Data structure ,01 natural sciences ,Tree (data structure) ,010201 computation theory & mathematics ,Tree rearrangement ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Time complexity ,Algorithm - Abstract
Representation choice and the development of search operators are crucial aspects of the efficiency of Evolutionary Algorithms (EAs) in combinatorial problems. Several researchers have proposed representations and operators for EAs that manipulate spanning trees. This paper proposes a new encoding called Node-depth Phylogenetic-based Encoding (NPE). NPE represents spanning trees by the relation between nodes and their depths using a relatively simple codification/decodification process. The proposed NPE operators are based on methods used for tree rearrangement in phylogenetic tree reconstruction: subtree prune and regraft; and tree bisection and reconstruction. NPE and its operators are designed to have high locality, feasibility, low time complexity, be unbiased, and have independent weight. Therefore, NPE is a good choice of data structure for EAs applied to network design problems.
- Published
- 2016
- Full Text
- View/download PDF
19. Challenges in Species Tree Estimation Under the Multispecies Coalescent Model
- Author
-
Bo Xu and Ziheng Yang
- Subjects
0106 biological sciences ,0301 basic medicine ,Concatenation ,Inference ,Review ,Biology ,computer.software_genre ,Bayesian inference ,Bioinformatics ,010603 evolutionary biology ,01 natural sciences ,Coalescent theory ,03 medical and health sciences ,Genetics ,Animals ,Molecular clock ,Phylogeny ,Likelihood Functions ,Polymorphism, Genetic ,Models, Genetic ,Phylogenetic tree ,Tree (data structure) ,030104 developmental biology ,Tree rearrangement ,Data mining ,computer - Abstract
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
- Published
- 2016
- Full Text
- View/download PDF
20. Improvement of the end result of multiple sequence alignment by tree rearrangement methods and the principle of minimum evolution
- Author
-
Oliveira, José Augusto Sabino, Universidade Estadual Paulista (Unesp), and Zafalon, Geraldo Francisco Donegá [UNESP]
- Subjects
Tree rearrangement ,Rearranjo de árvores ,Bioinformatics ,Progressive alignment ,Bioinformática ,Multiple sequence alignment ,Guide tree refinement ,Alinhamento múltiplo de sequências ,Refinamento da árvore guia ,Alinhamento progressivo - Abstract
Submitted by Jose Augusto Sabino de Oliveira (augusto.sabino@unesp.br) on 2019-10-16T01:07:03Z No. of bitstreams: 1 02.1 Unesp___Defesa_2019.pdf: 2217976 bytes, checksum: 5413e1483d17e420355cd2e88f23953d (MD5) Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2019-10-16T14:28:56Z (GMT) No. of bitstreams: 1 oliveira_jas_me_sjrp_par.pdf: 236922 bytes, checksum: 5d9ddd168cddd3214ab71afa7f8f9dac (MD5) Made available in DSpace on 2019-10-16T14:28:56Z (GMT). No. of bitstreams: 1 oliveira_jas_me_sjrp_par.pdf: 236922 bytes, checksum: 5d9ddd168cddd3214ab71afa7f8f9dac (MD5) Previous issue date: 2019-08-07 Os recentes avanços científicos e tecnológicos proporcionaram um grande aumento dos dados biológicos disponíveis para análise e técnicas manuais são inviáveis para executar a análise desses dados e, assim, a Bioinformática surgiu devido a necessidade de analisar-se esse volume de dados com ferramentas computacionais e possibilitar inferências relevantes. Neste contexto, técnicas de alinhamento de sequências são ferramentas imprescindíveis. No entanto, alinhar sequências tem alto custo computacional e neste cenário, adota-se o Alinhamento Múltiplo de Sequências no qual diversas heurísticas são adotadas para amenizar o problema do custo computacional. Uma destas é o Alinhamento Progressivo que funciona basicamente em três fases: alinhamento par a par, construção da árvore guia e o alinhamento de perfis. Diversos estudos destacam a importância da árvore guia e de refiná-la para manter as sequências mais similares em ramificações próximas e assim, produzir um resultado com melhor significância biológica. Assim, no presente trabalho apresenta-se conceitos e técnicas essenciais da Bioinformática e de construção da árvore guia e, por fim apresenta-se um método de refinamento da árvore guia aplicado em ferramentas que usam o Alinhamento Progressivo e que trabalha entre as etapas de construção da árvore guia e do profile final. O método recebe como entrada uma árvore guia e sua matriz de distâncias produzida pelo Alinhamento Progressivo, promove mudanças em suas ramificações internas e avalia se a probabilidade da árvore produzida é mais próxima da árvore verdadeira. Se probabilidade da árvore produzida é maior que a da árvore guia inicial, a nova árvore é devolvida como a árvore guia para a ferramenta de alinhamento construir o profile final, o que proporciona resultados com melhor significado biológico. Recent scientific and technological advances have provided a large increase in the biological data available for analysis and manual techniques are unfeasible to perform the analysis of this data and thus Bioinformatics arose due to the need to analyze this data volume with computational tools and enable relevant inferences. In this context, sequence alignment techniques are essential tools. However, aligning sequences has a high computational cost and in this scenario, Multiple Sequence Alignment is adopted in which several heuristics are adopted to alleviate the computational cost problem. One of these is Progressive Alignment which works basically in three phases: pairwise alignment, guide tree construction and profile alignment. Several studies highlight the importance of the guide tree and refining it to maintain the most similar sequences in close branches and thus produce a result with better biological significance. Thus, in the present work we discuss essential concepts and techniques of Bioinformatics and the construction of the guide tree and, finally, we present a method of refinement of the guide tree applied in tools that use Progressive Alignment and that works between he guide tree build step and the final profile build step. The method receives as input a guide tree and its distance matrix produced by Progressive Alignment, makes changes in its internal branches and evaluates if the probability of the produced tree is closer to the true tree. If the produced tree’s probability is greater than that of the initial guide tree, the new tree is returned as the guide tree for the alignment tool to build the final profile, which gives better biological significance results.
- Published
- 2019
21. Gene tree species tree reconciliation with gene conversion
- Author
-
Damir Hasic, Eric Tannier, University of Sarajevo, UNIVERZITET U SARAJEVU, Artificial Evolution and Computational Biology (BEAGLE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2), UNIVERSITY OF SARAJEVO - UNIVERZITET U SARAJEVU, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Most recent common ancestor ,FOS: Computer and information sciences ,Theoretical computer science ,Gene Transfer, Horizontal ,Gene Conversion ,Biology ,Quantitative Biology - Quantitative Methods ,01 natural sciences ,species tree ,010305 fluids & plasmas ,Evolution, Molecular ,03 medical and health sciences ,Gene Duplication ,Computer Science - Data Structures and Algorithms ,0103 physical sciences ,Gene duplication ,Computer Simulation ,Data Structures and Algorithms (cs.DS) ,Gene conversion ,conversion ,Phylogeny ,Quantitative Methods (q-bio.QM) ,ComputingMilieux_MISCELLANEOUS ,Probability ,030304 developmental biology ,Event (probability theory) ,0303 health sciences ,Models, Genetic ,Phylogenetic tree ,Ecology ,Applied Mathematics ,loss ,Agricultural and Biological Sciences (miscellaneous) ,Randomized algorithm ,Tree (data structure) ,duplication ,gene tree ,reconciliation ,Modeling and Simulation ,Tree rearrangement ,FOS: Biological sciences ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Algorithms ,Gene Deletion - Abstract
Gene tree/species tree reconciliation is a recent decisive progress in phylogenetic methods, accounting for the possible differences between gene histories and species histories. Reconciliation consists in explaining these differences by gene-scale events such as duplication, loss, transfer, which translates mathematically into a mapping between gene tree nodes and species tree nodes or branches. Gene conversion is a frequent and important evolutionary event, which results in the replacement of a gene by a copy of another from the same species and in the same gene tree. Including this event in reconciliation models has never been attempted because it introduces a dependency between lineages, and standard algorithms based on dynamic programming become ineffective. We propose here a novel mathematical framework including gene conversion as an evolutionary event in gene tree/species tree reconciliation. We describe a randomized algorithm that finds, in polynomial running time, a reconciliation minimizing the number of duplications, losses and conversions in the case when their weights are equal. We show that the space of optimal reconciliations includes an analog of the last common ancestor reconciliation, but is not limited to it. Our algorithm outputs any optimal reconciliation with a non-null probability. We argue that this study opens a research avenue on including gene conversion in reconciliation, and discuss its possible importance in biology.
- Published
- 2019
- Full Text
- View/download PDF
22. Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life
- Author
-
Eugene V. Koonin, Yuri I. Wolf, and Pere Puigbò
- Subjects
Bacteria ,Gene Transfer, Horizontal ,Phylogenetic tree ,Tree of life (biology) ,Genomics ,Phylogenetic comparative methods ,Phylogenetic network ,Biology ,Bioinformatics ,Archaea ,Tree (graph theory) ,Article ,Evolutionary biology ,Tree rearrangement ,Computational phylogenetics ,Split ,Algorithms ,Phylogeny - Abstract
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a ‘species tree’.
- Published
- 2019
- Full Text
- View/download PDF
23. Correcting Gene Trees by Leaf Insertions: Complexity and Approximation
- Author
-
Stefano Beretta, Riccardo Dondi, Beretta, S, and Dondi, R
- Subjects
0301 basic medicine ,K-ary tree ,Computational Complexity ,General Computer Science ,0206 medical engineering ,Gene Tree-Species Tree Reconciliation ,02 engineering and technology ,Algorithms ,Computational biology ,Gene tree corrections ,Phylogenomics ,Theoretical Computer Science ,Computer Science ,Combinatorics ,03 medical and health sciences ,Phylogenomic ,Mathematics ,Multiset ,Settore INF/01 - Informatica ,Computer Science (all) ,Gene tree correction ,Approximation algorithm ,Search tree ,Algorithm ,Tree (data structure) ,Tree traversal ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Tree rearrangement ,020602 bioinformatics ,Order statistic tree ,Computer Science(all) - Abstract
Gene tree correction has recently gained interest in phylogenomics, as it gives insights in understanding the evolution of gene families. Following some recent approaches based on leaf edit operations, we consider a variant of the problem where a gene tree is corrected by inserting leaves with labels in a multiset M. We show that the problem of deciding whether a gene tree can be corrected by inserting leaves with labels in M is NP-complete. Then, we consider an optimization variant of the problem that asks for the correction of a gene tree with leaves labeled by a multiset M ′ , with M ′ ⊇ M , having minimum size. For this optimization variant of the problem, we present a factor 2 approximation algorithm.
- Published
- 2016
- Full Text
- View/download PDF
24. Distributions of topological tree metrics between a species tree and a gene tree
- Author
-
Jing Xi, Ruriko Yoshida, and Jin Xie
- Subjects
0301 basic medicine ,Statistics and Probability ,Discrete mathematics ,K-ary tree ,0206 medical engineering ,Populations and Evolution (q-bio.PE) ,02 engineering and technology ,Interval tree ,Topology ,Search tree ,Random binary tree ,Range tree ,Combinatorics ,03 medical and health sciences ,030104 developmental biology ,Tree structure ,FOS: Biological sciences ,Tree rearrangement ,Quantitative Biology - Populations and Evolution ,020602 bioinformatics ,Tree measurement ,Mathematics - Abstract
In order to conduct a statistical analysis on a given set of phylogenetic gene trees, we often use a distance measure between two trees. In a statistical distance-based method to analyze discordance between gene trees, it is a key to decide "biological meaningful" and "statistically well-distributed" distance between trees. Thus, in this paper, we study the distributions of the three tree distance metrics: the edge difference, the path difference, and the precise $K$ interval cospeciation distance, between two trees: first, we focus on distributions of the three tree distances between two random unrooted trees with $n$ leaves ($n \geq 4$); and then we focus on the distributions the three tree distances between a fixed rooted species tree with $n$ leaves and a random gene tree with $n$ leaves generated under the coalescent process with given the species tree. We show some theoretical results as well as simulation study on these distributions., 18 pages, 11 figures
- Published
- 2016
- Full Text
- View/download PDF
25. Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?
- Author
-
Tanja Stadler, James H. Degnan, and Noah A. Rosenberg
- Subjects
0106 biological sciences ,0301 basic medicine ,Genetic Speciation ,Biology ,phylogeny ,Birth-death process ,Genealogy ,Multispecies coalescent ,Phylogeny ,010603 evolutionary biology ,01 natural sciences ,Models, Biological ,Coalescent theory ,Time ,Branching (linguistics) ,03 medical and health sciences ,Phylogenetics ,Genetics ,Computer Simulation ,genealogy ,Ecology, Evolution, Behavior and Systematics ,Phylogenetic tree ,Ecology ,Gene tree ,Birth–death process ,Classification ,multispecies coalescent ,030104 developmental biology ,Evolutionary biology ,Tree rearrangement ,Regular Articles - Abstract
Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth–death and multispecies coalescent model can explain the difference in empirical trees and birth–death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion., Systematic Biology, 65 (4), ISSN:1063-5157, ISSN:1076-836X
- Published
- 2016
26. Analysis of a Rapid Evolutionary Radiation Using Ultraconserved Elements: Evidence for a Bias in Some Multispecies Coalescent Methods
- Author
-
Edward L. Braun, Kelly A. Meiklejohn, Travis C. Glenn, Rebecca T. Kimball, and Brant C. Faircloth
- Subjects
0106 biological sciences ,0301 basic medicine ,Polytomy ,Genetics ,Pseudolikelihood ,Phylogenetic tree ,High-Throughput Nucleotide Sequencing ,Biology ,Classification ,Biological Evolution ,Models, Biological ,010603 evolutionary biology ,01 natural sciences ,Coalescent theory ,03 medical and health sciences ,030104 developmental biology ,Phylogenetics ,Evolutionary biology ,Tree rearrangement ,Supermatrix ,Clade ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Probability - Abstract
Rapid evolutionary radiations are expected to require large amounts of sequence data to resolve. To resolve these types of relationships many systematists believe that it will be necessary to collect data by next-generation sequencing (NGS) and use multispecies coalescent ("species tree") methods. Ultraconserved element (UCE) sequence capture is becoming a popular method to leverage the high throughput of NGS to address problems in vertebrate phylogenetics. Here we examine the performance of UCE data for gallopheasants (true pheasants and allies), a clade that underwent a rapid radiation 10-15 Ma. Relationships among gallopheasant genera have been difficult to establish. We used this rapid radiation to assess the performance of species tree methods, using ∼600 kilobases of DNA sequence data from ∼1500 UCEs. We also integrated information from traditional markers (nuclear intron data from 15 loci and three mitochondrial gene regions). Species tree methods exhibited troubling behavior. Two methods [Maximum Pseudolikelihood for Estimating Species Trees (MP-EST) and Accurate Species TRee ALgorithm (ASTRAL)] appeared to perform optimally when the set of input gene trees was limited to the most variable UCEs, though ASTRAL appeared to be more robust than MP-EST to input trees generated using less variable UCEs. In contrast, the rooted triplet consensus method implemented in Triplec performed better when the largest set of input gene trees was used. We also found that all three species tree methods exhibited a surprising degree of dependence on the program used to estimate input gene trees, suggesting that the details of likelihood calculations (e.g., numerical optimization) are important for loci with limited phylogenetic information. As an alternative to summary species tree methods we explored the performance of SuperMatrix Rooted Triple - Maximum Likelihood (SMRT-ML), a concatenation method that is consistent even when gene trees exhibit topological differences due to the multispecies coalescent. We found that SMRT-ML performed well for UCE data. Our results suggest that UCE data have excellent prospects for the resolution of difficult evolutionary radiations, though specific attention may need to be given to the details of the methods used to estimate species trees.
- Published
- 2016
- Full Text
- View/download PDF
27. <scp>MO</scp> ‐Phylogenetics: a phylogenetic inference software tool with multi‐objective evolutionary metaheuristics
- Author
-
Antonio J. Nebro, Cristian Zambrano-Vega, and José F. Aldana-Montes
- Subjects
0301 basic medicine ,Computational complexity theory ,02 engineering and technology ,Biology ,Machine learning ,computer.software_genre ,Multi-objective optimization ,03 medical and health sciences ,Software ,Phylogenetics ,TheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY ,0202 electrical engineering, electronic engineering, information engineering ,Metaheuristic ,Ecology, Evolution, Behavior and Systematics ,Phylogenetic tree ,business.industry ,Ecological Modeling ,Maximum parsimony ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Tree rearrangement ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Summary Phylogenetic inference is the process of searching and reconstructing the best phylogenetic tree that explains the evolution of species from a given data set. It is considered as an NP-hard problem due to the computational complexity required to find the optimal phylogenetic trees in the space of all the possible topologies. We have developed MO-Phylogenetics, a software tool to infer phylogenetic trees optimizing two reconstruction criteria simultaneously, integrating a framework for multi-objective optimization with two phylogenetic software packages. As a result, researchers in life sciences have at their disposal a high-performance tool including a number of multi-objective metaheuristics that can be applied to phylogenetic inference using the maximum parsimony and maximum likelihood as objectives to be optimized at the same time.
- Published
- 2016
- Full Text
- View/download PDF
28. Neighborhoods of Phylogenetic Trees: Exact and Asymptotic Counts
- Author
-
Mike Steel, J. V. de Jong, and Jeanette C. McLeod
- Subjects
Discrete mathematics ,Phylogenetic tree ,General Mathematics ,0206 medical engineering ,Structure (category theory) ,02 engineering and technology ,01 natural sciences ,010101 applied mathematics ,Set (abstract data type) ,Combinatorics ,Tree (data structure) ,Phylogenetics ,Tree rearrangement ,Metric (mathematics) ,0101 mathematics ,020602 bioinformatics ,Mathematics - Abstract
A central theme in phylogenetics is the reconstruction and analysis of evolutionary trees from a given set of data. To determine the optimal search methods for reconstructing trees, it is crucial to understand the size and structure of the neighborhoods of trees under tree rearrangement operations. The diameter and size of the immediate neighborhood of a tree have been well-studied; however, little is known about the number of trees at distance two, three, or (more generally) $k$ from a given tree. In this paper we provide a number of exact and asymptotic results concerning these quantities and identify some key aspects of tree shape that play a role in determining these quantities. We obtain several new results for two of the main tree rearrangement operations---nearest neighbor interchange and subtree prune and regraft---as well as for the Robinson--Foulds metric on trees.
- Published
- 2016
- Full Text
- View/download PDF
29. A navigation system for tree space
- Author
-
Sarah Mark, Mike Steel, and Jeanette C. McLeod
- Subjects
0301 basic medicine ,Discrete mathematics ,K-ary tree ,General Computer Science ,Interval tree ,Search tree ,Computer Science Applications ,Theoretical Computer Science ,Range tree ,Combinatorics ,03 medical and health sciences ,030104 developmental biology ,Tree structure ,Computational Theory and Mathematics ,Tree rearrangement ,Geometry and Topology ,Left-child right-sibling binary tree ,Vantage-point tree ,Mathematics - Abstract
The reconstruction of evolutionary trees from data sets on overlapping sets of species is a central problem in phylogenetics. Provided that the tree reconstructed for each subset of species is rooted and that these trees t together consistently, the space of all parent trees that ‘display’ these trees was recently shown to satisfy the following strong property: there exists a path from any one parent tree to any other parent tree by a sequence of local rearrangements (nearest neighbour interchanges) so that each intermediate tree also lies in this same tree space. However, the proof of this result uses a non-constructive argument. In this paper we describe a specic, polynomial-time procedure for navigating from any given parent tree to another while remaining in this tree space. The results are of particular relevance to the recent study of ‘phylogenetic terraces’.
- Published
- 2016
- Full Text
- View/download PDF
30. Phylogenetic Networks with Every Embedded Phylogenetic Tree a Base Tree
- Author
-
Charles Semple
- Subjects
0301 basic medicine ,Class (set theory) ,K-ary tree ,Theoretical computer science ,Computer science ,General Mathematics ,Immunology ,computer.software_genre ,Models, Biological ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Phylogeny ,General Environmental Science ,Pharmacology ,Models, Genetic ,Phylogenetic tree ,General Neuroscience ,Mathematical Concepts ,Phylogenetic network ,Base (topology) ,Biological Evolution ,Tree (data structure) ,030104 developmental biology ,Tree structure ,Computational Theory and Mathematics ,Tree rearrangement ,Data mining ,General Agricultural and Biological Sciences ,computer ,Algorithms - Abstract
We show that the class of tree-child networks is precisely the class of tree-based networks with the property that every embedded phylogenetic tree is a base tree.
- Published
- 2015
- Full Text
- View/download PDF
31. Algorithms for efficient phylogenetic tree construction
- Author
-
Oliver Eulenstein, David Fernández-Baca, and Mukul S. Bansal
- Subjects
Phylogenetic tree ,Phylogenetics ,T-REX ,Tree rearrangement ,Computational phylogenetics ,Phylogenetic network ,Phylogenetic comparative methods ,Biology ,Algorithm ,Genome - Abstract
The rapidly increasing amount of available genomic sequence data provides an abundance of potential information for phylogenetic analyses. Many models and methods have been developed to build evolutionary trees based on this information. A common feature of most of these models is that they start out with fragments of the genome, called genes. Depending on the genes and species, and the methods used to perform the phylogenetic analyses, one typically ends up with a large number of phylogenetic trees which may not agree with one another. Simply put, the problem now is the following: Given several discordant phylogenetic trees as input, infer the (presumably) correct phylogeny. This thesis seeks to address some of the methodological and algorithmic challenges posed by this problem. In particular, we present two new algorithms related to inferring phylogenetic trees in the presence of gene duplication, and introduce a new distance measure for comparing phylogenetic trees.
- Published
- 2018
- Full Text
- View/download PDF
32. Phylogeny reconciliation under gene tree parsimony
- Author
-
Wen-Chieh Chang and Oliver Eulenstein
- Subjects
Dynamic programming ,ComputingMethodologies_PATTERNRECOGNITION ,Theoretical computer science ,Phylogenetic tree ,Phylogenetics ,Knapsack problem ,Tree rearrangement ,Gene tree ,Integer programming ,Travelling salesman problem ,Algorithm ,Mathematics - Abstract
The growing genomic and phylogenetic data sets represent a unique opportunity to analytically and computationally study the relationship among diversifying species. Unfortunately, such data often result in contradictory gene phylogenies due to common yet unobserved evolutionary events, e.g., gene duplication or deep coalescence. Gene tree parsimony (GTP) methods address such issue by reconciling gene phylogenies into one consistent species evolutionary history as well as identifying the underlying events. In this study, we solve not only the GTP problem but also propose a new method to select gene trees in order to assist biologists in gaining insight from phylogenetic analysis. First, we introduce exact solutions for the intrinsically complex GTP problem. Exact solutions for NP-hard problems, like GTP, have a long and extensive history of improvements for classic problems such as traveling salesman and knapsack. Our solutions presented here are designed via integer linear programming (ILP) and dynamic programming (DP), which are techniques widely used in solving problems of similar complexity. We also demonstrate the effectiveness of our solutions through simulation analysis and empirical datasets. To ensure input data coherence for GTP analysis, as a method to strengthen species represented in a gene tree, we introduce the quasi-biclique (QBC) approach to analyze and condense input datasets. In order to take advantage of emerging techniques that further describe the sequence-host and gene-taxon relations, quasi-bicliques are optimized via weighted edge connectivities and distribution of missing information. Our study showed these QBC mining problems are NP-hard. We describe an ILP formulation that is capable of finding optimal QBCs in an effort to support GTP analysis. We also investigate the applicability of QBC to other applications such as mining genetic interaction networks to encouraging results.
- Published
- 2018
- Full Text
- View/download PDF
33. On the Subnet Prune and Regraft Distance
- Author
-
Jonathan Klawitter and Simone Linz
- Subjects
FOS: Computer and information sciences ,Discrete Mathematics (cs.DM) ,Context (language use) ,0102 computer and information sciences ,01 natural sciences ,Theoretical Computer Science ,Combinatorics ,FOS: Mathematics ,Discrete Mathematics and Combinatorics ,Mathematics - Combinatorics ,Quantitative Biology - Populations and Evolution ,Mathematics ,05C90, 92D15, 68R10 ,Phylogenetic tree ,Applied Mathematics ,Populations and Evolution (q-bio.PE) ,Phylogenetic network ,Directed acyclic graph ,Subnet ,Tree (data structure) ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,Tree rearrangement ,FOS: Biological sciences ,Metric (mathematics) ,Geometry and Topology ,Combinatorics (math.CO) ,Computer Science - Discrete Mathematics ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Phylogenetic networks are rooted directed acyclic graphs that represent evolutionary relationships between species whose past includes reticulation events such as hybridisation and horizontal gene transfer. To search the space of phylogenetic networks, the popular tree rearrangement operation rooted subtree prune and regraft (rSPR) was recently generalised to phylogenetic networks. This new operation – called subnet prune and regraft (SNPR) – induces a metric on the space of all phylogenetic networks as well as on several widely-used network classes. In this paper, we investigate several problems that arise in the context of computing the SNPR-distance. For a phylogenetic tree $T$ and a phylogenetic network $N$, we show how this distance can be computed by considering the set of trees that are embedded in $N$ and then use this result to characterise the SNPR-distance between $T$ and $N$ in terms of agreement forests. Furthermore, we analyse properties of shortest SNPR-sequences between two phylogenetic networks $N$ and $N'$, and answer the question whether or not any of the classes of tree-child, reticulation-visible, or tree-based networks isometrically embeds into the class of all phylogenetic networks under SNPR.
- Published
- 2018
- Full Text
- View/download PDF
34. There are no caterpillars in a wicked forest
- Author
-
James H. Degnan and John A. Rhodes
- Subjects
education.field_of_study ,Models, Genetic ,Genetic Speciation ,Ecology ,Population ,Populations and Evolution (q-bio.PE) ,Inference ,Biology ,Quantitative Biology::Genomics ,Pedigree ,92D15 ,Coalescent theory ,Evolutionary biology ,Phylogenetics ,FOS: Biological sciences ,Tree rearrangement ,Genetic algorithm ,Humans ,Quantitative Biology::Populations and Evolution ,Tree (set theory) ,Quantitative Biology - Populations and Evolution ,education ,Ecology, Evolution, Behavior and Systematics - Abstract
Species trees represent the historical divergences of populations or species, while gene trees trace the ancestry of individual gene copies sampled within those populations. In cases involving rapid speciation, gene trees with topologies that differ from that of the species tree can be most probable under the standard multispecies coalescent model, making species tree inference more difficult. Such anomalous gene trees are not well understood except for some small cases. In this work, we establish one constraint that applies to trees of any size: gene trees with "caterpillar" topologies cannot be anomalous. The proof of this involves a new combinatorial object, called a population history, which keeps track of the number of coalescent events in each ancestral population., 16 pages, 4 figures
- Published
- 2015
- Full Text
- View/download PDF
35. Variable solution structure can be helpful in evolutionary optimization
- Author
-
Chao Qian, Yang Yu, and Zhi-Hua Zhou
- Subjects
Mathematical optimization ,General Computer Science ,Heuristic (computer science) ,Tree rearrangement ,Evolutionary algorithm ,Combinatorial optimization ,Genetic programming ,Genetic representation ,Evolutionary programming ,Evolutionary computation ,Mathematics - Abstract
Evolutionary algorithms are a family of powerful heuristic optimization algorithms where various representations have been used for solutions. Previous empirical studies have shown that for achieving a better efficiency of evolutionary optimization, it is often helpful to adopt rich representations (e.g., trees and graphs) rather than ordinary representations (e.g., binary coding). Such a recognition, however, has little theoretical justifications. In this paper, we present a running time analysis on genetic programming. In contrast to previous theoretical efforts focused on simple synthetic problems, we study two classical combinatorial problems, the maximum matching and the minimum spanning tree problems. Our theoretical analysis shows that evolving tree-structured solutions is much more efficient than evolving binary vector encoded solutions, which is also verified by experiments. The analysis discloses that variable solution structure might be helpful in evolutionary optimization when the solution complexity can be well controlled.
- Published
- 2015
- Full Text
- View/download PDF
36. Exploring Tree-Like and Non-Tree-Like Patterns Using Genome Sequences: An Example Using the Inbreeding Plant SpeciesArabidopsis thaliana(L.) Heynh
- Author
-
Bret Larget, Noah Stenz, David A. Baum, and Cécile Ané
- Subjects
Genetics ,Panmixia ,education.field_of_study ,Phylogenetic tree ,Population ,Arabidopsis ,Selfing ,Biology ,Classification ,Genome ,Coalescent theory ,Tree (data structure) ,Evolutionary biology ,Tree rearrangement ,Inbreeding ,education ,Genome, Plant ,Phylogeny ,Ecology, Evolution, Behavior and Systematics - Abstract
Genome sequence data contain abundant information about genealogical history, but methods for extracting and interpreting this information are not yet fully developed. We analyzed genome sequences for multiple accessions of the selfing plant, Arabidopsis thaliana, with the goal of better understanding its genealogical history. As expected from accessions of the same species, we found much discordance between nuclear gene trees. Nonetheless, we inferred the optimal population tree under the assumption that all discordance is due to incomplete lineage sorting. To cope with the size of the data (many genes and many taxa), our pipeline is based on parallel computing and divides the problem into four-taxon trees. However, just because a population tree can be estimated does not mean that the assumptions of the multispecies coalescent model hold. Therefore, we implemented a new, nonparametric test to evaluate whether a population tree adequately explains the observed quartet frequencies (the frequencies of gene trees with each resolution of each four-taxon set). This test also considers other models: panmixia and a partially resolved population tree, that is, a tree in which some nodes are collapsed into local panmixia. We found that a partially resolved population tree provides the best fit to the data, providing evidence for tree-like structure within A. thaliana, qualitatively similar to what might be expected between different, closely related species. Further, we show that the pattern of deviation from expectations can be used to identify instances of introgression and detect one clear case of reticulation among ecotypes that have come into contact in the United Kingdom. Our study illustrates how we can use genome sequence data to evaluate whether phylogenetic relationships are strictly tree-like or reticulating. (Concordance; gene tree discordance; quartet; species concepts; TICR.)
- Published
- 2015
- Full Text
- View/download PDF
37. Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites
- Author
-
Julia Chifman and Laura Kubatko
- Subjects
Statistics and Probability ,Time Factors ,Genetic Speciation ,14, 32, 62, 92 ,Biology ,Models, Biological ,General Biochemistry, Genetics and Molecular Biology ,DNA sequencing ,Coalescent theory ,Evolution, Molecular ,Tree (descriptive set theory) ,Phylogenetics ,Computational phylogenetics ,Quantitative Biology - Populations and Evolution ,Phylogeny ,Probability ,General Immunology and Microbiology ,Phylogenetic tree ,Ecology ,Applied Mathematics ,Populations and Evolution (q-bio.PE) ,DNA ,Sequence Analysis, DNA ,General Medicine ,Process substitution ,Markov Chains ,Evolutionary biology ,FOS: Biological sciences ,Modeling and Simulation ,Tree rearrangement ,Mutation ,General Agricultural and Biological Sciences ,Algorithms - Abstract
The inference of the evolutionary history of a collection of organisms is a problem of fundamental importance in evolutionary biology. The abundance of DNA sequence data arising from genome sequencing projects has led to significant challenges in the inference of these phylogenetic relationships. Among these challenges is the inference of the evolutionary history of a collection of species based on sequence information from several distinct genes sampled throughout the genome. It is widely accepted that each individual gene has its own phylogeny, which may not agree with the species tree. Many possible causes of this gene tree incongruence are known. The best studied is the incomplete lineage sorting, which is commonly modeled by the coalescent process. Numerous methods based on the coalescent process have been proposed for the estimation of the phylogenetic species tree given DNA sequence data. However, use of these methods assumes that the phylogenetic species tree can be identified from DNA sequence data at the leaves of the tree, although this has not been formally established. We prove that the unrooted topology of the n-leaf phylogenetic species tree is generically identifiable given observed data at the leaves of the tree that are assumed to have arisen from the coalescent process under a time-reversible substitution process with the possibility of site-specific rate variation modeled by the discrete gamma distribution and a proportion of invariable sites.
- Published
- 2015
- Full Text
- View/download PDF
38. Consistency of a phylogenetic tree maximum likelihood estimator
- Author
-
Arindam RoyChoudhury, John Bunge, and Amy D. Willis
- Subjects
Statistics and Probability ,Phylogenetic tree ,Estimation theory ,Applied Mathematics ,Combinatorics ,Tree (data structure) ,Tree structure ,Tree rearrangement ,Computational phylogenetics ,Statistics ,Statistics, Probability and Uncertainty ,Order statistic tree ,Vantage-point tree ,Mathematics - Abstract
Phylogenetic trees represent the order and extent of genetic divergence of a fixed collection of organisms. Order of divergence is represented via the tree structure, and extent of divergence by the branch lengths. Both the tree’s structure and branch lengths are unknown parameters and the tree is estimated using sequence information sampled at a number of genetic sites. Under the model of genetic Brownian motion, we prove that as the number of genetic sites that are sampled becomes large, the maximum likelihood estimator of the tree is consistent. (Our maximum likelihood estimator treats each site as an independent data point, which is different from concatenating the sites.) Existing arguments for consistency rely on the assumption of a finite parameter space or only apply to transition probability matrix-based models, and do not hold here due to the continuous model for branch lengths. The metric space of Billera et al. (2001) is central to the proof. We conclude with some comments on the role of parametric methods in tree estimation.
- Published
- 2015
- Full Text
- View/download PDF
39. Improved gene tree error correction in the presence of horizontal gene transfer
- Author
-
Mukul S. Bansal, Manolis Kellis, Eric J. Alm, Yi-Chieh Wu, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Biological Engineering, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Bansal, Mukul S., Wu, Yi-Chieh, Alm, Eric J., and Kellis, Manolis
- Subjects
Statistics and Probability ,Gene Transfer, Horizontal ,Computational biology ,Biology ,Cyanobacteria ,Biochemistry ,Evolution, Molecular ,Gene Duplication ,Computational phylogenetics ,Gene duplication ,Gene family ,Molecular Biology ,Phylogeny ,Statistical hypothesis testing ,Genetics ,Phylogenetic tree ,Computational Biology ,Original Papers ,3. Good health ,Computer Science Applications ,Phylogenetics ,Computational Mathematics ,Tree (data structure) ,Computational Theory and Mathematics ,Multigene Family ,Tree rearrangement ,Horizontal gene transfer ,Algorithms - Abstract
Motivation: The accurate inference of gene trees is a necessary step in many evolutionary studies. Although the problem of accurate gene tree inference has received considerable attention, most existing methods are only applicable to gene families unaffected by horizontal gene transfer. As a result, the accurate inference of gene trees affected by horizontal gene transfer remains a largely unaddressed problem. Results: In this study, we introduce a new and highly effective method for gene tree error correction in the presence of horizontal gene transfer. Our method efficiently models horizontal gene transfers, gene duplications and losses, and uses a statistical hypothesis testing framework [Shimodaira–Hasegawa (SH) test] to balance sequence likelihood with topological information from a known species tree. Using a thorough simulation study, we show that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that our method dramatically improves gene tree accuracy. We apply our method to a dataset of 11 cyanobacterial species and demonstrate the large impact of gene tree accuracy on downstream evolutionary analyses. Availability and implementation: An implementation of our method is available at http://compbio.mit.edu/treefix-dtl/, National Science Foundation (U.S.) (CAREER Award 0644282), National Institutes of Health (U.S.) (RC2 HG005639), National Science Foundation (U.S.). Assembling the Tree of Life (Program) (0936234), University of Connecticut
- Published
- 2015
- Full Text
- View/download PDF
40. Estimating phylogenetic trees from genome-scale data
- Author
-
Liang Liu, Scott V. Edwards, Shaoyuan Wu, Zhenxiang Xi, and Charles C. Davis
- Subjects
Phylogenetic tree ,Ecology ,General Neuroscience ,Concatenation ,Phylogenetic network ,Biology ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Coalescent theory ,Tree (data structure) ,History and Philosophy of Science ,Evolutionary biology ,Tree rearrangement ,Computational phylogenetics ,ComputingMethodologies_GENERAL - Abstract
The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data.
- Published
- 2015
- Full Text
- View/download PDF
41. On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods
- Author
-
Sebastien Roch and Tandy Warnow
- Subjects
Phylogenetic tree ,Gene tree ,Robustness (evolution) ,Biology ,Classification ,Network topology ,Coalescent theory ,Genes ,Evolutionary biology ,Tree rearrangement ,Statistics ,Genetics ,Computer Simulation ,Phylogeny ,Ecology, Evolution, Behavior and Systematics - Abstract
The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.
- Published
- 2015
- Full Text
- View/download PDF
42. Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data
- Author
-
Takeshi Kawakami, Alexander Nater, Reto Burri, Hans Ellegren, and Linnéa Smeds
- Subjects
Gene Flow ,0106 biological sciences ,introgression ,Biology ,species tree ,010603 evolutionary biology ,01 natural sciences ,Gene flow ,Coalescent theory ,Songbirds ,03 medical and health sciences ,Species Specificity ,Effective population size ,Phylogenetics ,Computational phylogenetics ,Phylogenomics ,Genetics ,Animals ,Computer Simulation ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,0303 health sciences ,Genome ,Phylogenetic tree ,Ecology ,demographic modeling ,incomplete lineage sorting ,phylogenomics ,15. Life on land ,gene tree ,Evolutionary biology ,Tree rearrangement ,Approximate Bayesian computation ,Regular Articles - Abstract
Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post-divergence gene flow.
- Published
- 2015
43. Parameterized and approximation algorithms for maximum agreement forest in multifurcating trees
- Author
-
Sing-Hoi Sze, Jia-Hao Fan, and Jianer Chen
- Subjects
Combinatorics ,Discrete mathematics ,General Computer Science ,Open problem ,Tree rearrangement ,Link/cut tree ,Maximum coverage problem ,Vertex cover ,Weight-balanced tree ,Approximation algorithm ,Parameterized complexity ,Theoretical Computer Science ,Mathematics - Abstract
We study parameterized algorithms and approximation algorithms for the maximum agreement forest problem, which, for two given leaf-labeled trees, is to find a maximum forest that is a subgraph of both trees. The problem was motivated by research in phylogenetics. For parameterized algorithms, while the problem is known to be fixed-parameter tractable for binary trees, it was an open problem whether the problem is still fixed-parameter tractable for general trees. We resolve this open problem by developing an O ( 3 k n ) -time parameterized algorithm for general trees. Our techniques on tree structures also lead to a polynomial-time approximation algorithm of ratio 3 for the problem, giving the first constant-ratio approximation algorithm for general trees.
- Published
- 2015
- Full Text
- View/download PDF
44. On computing the maximum parsimony score of a phylogenetic network
- Author
-
Celine Scornavacca, Mareike Fischer, Steven Kelk, Leo van Iersel, RS: FSE DACS BMI, DKE Scientific staff, Ernst-Moritz-Arndt-Universität Greifswald, Delft University of Technology (TU Delft), Department of data science and Knowledge Engineering [Maastricht], Maastricht University [Maastricht], Institut des Sciences de l'Evolution de Montpellier (UMR ISEM), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), and Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)
- Subjects
General Mathematics ,F.2 ,0206 medical engineering ,G.2 ,approximability ,Inference ,02 engineering and technology ,Network topology ,parsimony ,Combinatorics ,03 medical and health sciences ,phylogenetic networks ,FOS: Mathematics ,Mathematics - Combinatorics ,[INFO]Computer Science [cs] ,Quantitative Biology - Populations and Evolution ,Integer programming ,030304 developmental biology ,Mathematics ,Discrete mathematics ,0303 health sciences ,Phylogenetic tree ,AMS subject classifications. 68W25, 05C20, 90C27, 92B10 ,software ,Populations and Evolution (q-bio.PE) ,Phylogenetic network ,Tree (graph theory) ,Maximum parsimony ,phylogenetic trees ,Tree rearrangement ,FOS: Biological sciences ,fixed-parameter tractability ,Combinatorics (math.CO) ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,complexity ,Algorithm ,020602 bioinformatics - Abstract
International audience; Phylogenetic networks are used to display the relationship among different species whose evolution is not treelike, which is the case, for instance, in the presence of hybridization events or horizontal gene transfers. Tree inference methods such as maximum parsimony need to be modified in order to be applicable to networks. In this paper, we discuss two different definitions of maximum parsimony on networks, “hardwired” and “softwired,” and examine the complexity of computing them given a network topology and a character. By exploiting a link with the problem Multiterminal Cut, we show that computing the hardwired parsimony score for 2-state characters is polynomial-time solvable, while for characters with more states this problem becomes NP-hard but is still approximable and fixed parameter tractable in the parsimony score. On the other hand we show that, for the softwired definition, obtaining even weak approximation guarantees is already difficult for binary characters and restricted network topologies, and fixed-parameter tractable algorithms in the parsimony score are unlikely. On the positive side we show that computing the softwired parsimony score is fixed-parameter tractable in the level of the network, a natural parameter describing how tangled reticulate activity is in the network. Finally, we show that both the hardwired and the softwired parsimony scores can be computed efficiently using integer linear programming. The software has been made freely available.
- Published
- 2015
- Full Text
- View/download PDF
45. Species Tree Inference Using a Mixture Model
- Author
-
Ikram Ullah, Jens Lagergren, and Pekka Parviainen
- Subjects
Inference ,Biology ,Bioinformatics ,Set (abstract data type) ,Phylogenetics ,Computational phylogenetics ,Genetics ,Animals ,Cluster Analysis ,Humans ,ta518 ,Molecular Biology ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,ta515 ,ta113 ,ta112 ,Models, Genetic ,Phylogenetic tree ,ta213 ,business.industry ,Pattern recognition ,Mixture model ,Tree (data structure) ,Genes ,Tree rearrangement ,ta5141 ,Artificial intelligence ,business ,Algorithms - Abstract
Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by coestimating gene trees and the species tree but this approach poses a scalability problem for larger data sets. We present MixTreEM-DLRS: A two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural expectation maximization algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model (Åkerborg O, Sennblad B, Arvestad L, Lagergren J. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A. 106(14):5714-5719), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad L, Lagergren J, Sennblad B. 2009. The gene evolution model and computing its associated probabilities. J ACM. 56(2):1-44). We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance with a recent genome-scale species tree reconstruction method PHYLDOG (Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V. 2013. Genome-scale coestimation of species and gene trees. Genome Res. 23(2):323-330) as well as with a fast parsimony-based algorithm Duptree (Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13):1540-1541). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem/.
- Published
- 2015
46. An Alternative Construction of Internodons: The Emergence of a Multi-level Tree of Life
- Author
-
D. J. Kornet, Arie de Bruin, and Samuel Alexander
- Subjects
Male ,Theoretical computer science ,K-ary tree ,Unary operation ,Genetic Speciation ,General Mathematics ,Tree of life (biology) ,Immunology ,Biology ,General Biochemistry, Genetics and Molecular Biology ,Animals ,Phylogeny ,General Environmental Science ,Pharmacology ,Models, Genetic ,Phylogenetic tree ,General Neuroscience ,Mathematical Concepts ,Tree (data structure) ,Tree structure ,Transformation (function) ,Computational Theory and Mathematics ,Tree rearrangement ,Female ,General Agricultural and Biological Sciences ,Algorithm ,Algorithms - Abstract
Internodons are a formalization of Hennig's concept of species. We present an alternative construction of internodons imposing a tree structure on the genealogical network. We prove that the segments (trivial unary trees) from this tree structure are precisely the internodons. We obtain the following spin-offs. First, the generated tree turns out to be an organismal tree of life. Second, this organismal tree is homeomorphic to the phylogenetic Hennigian species tree of life, implying the discovery of a multi-level tree of life: this phylogenetic tree can be obtained by zooming out from the organismal tree, or conversely, the organismal tree of life can be generated by expanding the phylogenetic nodes into unary trees. Finally, the definition of the organismal tree allows an efficient algorithmic transformation of a given genealogical network into its corresponding phylogenetic species tree of life. The latter will be presented in a separate paper.
- Published
- 2014
- Full Text
- View/download PDF
47. Practical Performance of Tree Comparison Metrics
- Author
-
Jon Yamato and Mary K. Kuhner
- Subjects
Mammals ,Discrete mathematics ,Tree distance ,Phylogenetic tree ,Contrast (statistics) ,Classification ,Tree (data structure) ,Data quality ,Tree rearrangement ,Statistics ,Metric (mathematics) ,Genetics ,Animals ,Computer Simulation ,Metric tree ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Mathematics - Abstract
The phylogenetic literature contains numerous measures for assessing differences between two phylogenetic trees. Individual measures have been criticized on various grounds, but little is known about their comparative performance in typical applications. We evaluate the performance of nine tree distance measures on two tasks: 1) distinguishing trees separated by lesser versus greater numbers of recombinations, and 2) distinguishing trees inferred with lower versus higher quality data. We find that when the trees being compared are similar, measures that make use of branch lengths are superior, with the branch-length version of the Robinson-Foulds metric performing best. In contrast, for dissimilar trees topology- only measures are superior, with the Alignment metric of Nye et al. performing best. We also apply the measures to a mammalian dataset and observe that the best metric depends on whether branch-length information is of interest. We give practical recommendations for choosing a tree distance metric in different applications. (Phylogenetics; tree comparison; tree distance metrics.)
- Published
- 2014
- Full Text
- View/download PDF
48. Neighborhoods of Trees in Circular Orderings
- Author
-
Sarah, Bastkowski, Sarah, Baskowski, Vincent, Moulton, Andreas, Spillner, and Taoyang, Wu
- Subjects
Pharmacology ,Discrete mathematics ,Likelihood Functions ,Binary tree ,K-ary tree ,Models, Genetic ,General Mathematics ,General Neuroscience ,Immunology ,Weight-balanced tree ,Mathematical Concepts ,Interval tree ,General Biochemistry, Genetics and Molecular Biology ,Random binary tree ,Range tree ,Evolution, Molecular ,Combinatorics ,Tree traversal ,Computational Theory and Mathematics ,Tree rearrangement ,General Agricultural and Biological Sciences ,Algorithms ,Phylogeny ,General Environmental Science ,Mathematics - Abstract
In phylogenetics, a common strategy used to construct an evolutionary tree for a set of species $$X$$ is to search in the space of all such trees for one that optimizes some given score function (such as the minimum evolution, parsimony or likelihood score). As this can be computationally intensive, it was recently proposed to restrict such searches to the set of all those trees that are compatible with some circular ordering of the set $$X$$ . To inform the design of efficient algorithms to perform such searches, it is therefore of interest to find bounds for the number of trees compatible with a fixed ordering in the neighborhood of a tree that is determined by certain tree operations commonly used to search for trees: the nearest neighbor interchange (nni), the subtree prune and regraft (spr) and the tree bisection and reconnection (tbr) operations. We show that the size of such a neighborhood of a binary tree associated with the nni operation is independent of the tree’s topology, but that this is not the case for the spr and tbr operations. We also give tight upper and lower bounds for the size of the neighborhood of a binary tree for the spr and tbr operations and characterize those trees for which these bounds are attained.
- Published
- 2014
- Full Text
- View/download PDF
49. Evolutionary induction of global model trees with specialized operators and memetic extensions
- Author
-
Marcin Czajkowski and Marek Kretowski
- Subjects
Mathematical optimization ,Information Systems and Management ,Process (engineering) ,ComputingMethodologies_MISCELLANEOUS ,Computer Science::Neural and Evolutionary Computation ,Decision tree ,Evolutionary algorithm ,Computer Science Applications ,Theoretical Computer Science ,Set (abstract data type) ,Tree (data structure) ,Artificial Intelligence ,Control and Systems Engineering ,Tree rearrangement ,Greedy algorithm ,Metaheuristic ,Software ,Mathematics - Abstract
Metaheuristics, such as evolutionary algorithms ( EA s), have been successfully applied to the problem of decision tree induction. Recently, an EA was proposed to evolve model trees, which are a particular type of decision tree that is employed to solve regression problems. However, there is a need to specialize the EA s in order to exploit the full potential of evolutionary induction. The main contribution of this paper is a set of solutions and techniques that incorporates knowledge about the inducing problem for the global model tree into the evolutionary search. The objective of this paper is to demonstrate that specialized EA can find more accurate and less complex solutions to the traditional greedy-induced counterparts and the straightforward application of EA .This paper proposes a novel solution for each step of the evolutionary process and presents a new specialized EA for model tree induction called the Global Model Tree ( GMT ). An empirical investigation shows that trees induced by the GMT are one order of magnitude less complex than trees induced by popular greedy algorithms, and they are equivalent in terms of predictive accuracy with output models from straightforward implementations of evolutionary induction and state-of-the-art methods.
- Published
- 2014
- Full Text
- View/download PDF
50. treespace: Statistical exploration of landscapes of phylogenetic trees
- Author
-
Caroline Colijn, Michelle Kendall, Jacob Almagro-Garcia, Thibaut Jombart, Engineering & Physical Science Research Council (EPSRC), and Medical Research Council (MRC)
- Subjects
0106 biological sciences ,0301 basic medicine ,Biology ,Machine learning ,computer.software_genre ,Bioinformatics ,tree distances ,010603 evolutionary biology ,01 natural sciences ,Computer Programs ,Trees ,Set (abstract data type) ,Evolution, Molecular ,03 medical and health sciences ,Phylogenetics ,Computational phylogenetics ,Genetics ,Resource Article ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,Evolutionary Biology ,Phylogenetic tree ,business.industry ,software ,RESOURCE ARTICLES ,tree metric ,Computational Biology ,Phylogenetic network ,package ,06 Biological Sciences ,Tree (data structure) ,R package ,030104 developmental biology ,multivariate analysis ,incongruence ,Tree rearrangement ,Data Interpretation, Statistical ,Artificial intelligence ,business ,computer ,Biotechnology - Abstract
The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low‐dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group‐specific consensus phylogenies. treespace also provides a user‐friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results.
- Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.