Gambette, Philippe, van Iersel, Leo, Jones, Mark, Lafond, Manuel, Pardi, Fabio, Scornavacca, Celine, Laboratoire d'Informatique Gaspard-Monge (LIGM), Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM), Delft Institute of Applied Mathematics (TWA), Faculty of Electrical Engineering, Mathematics and Computer Science [Delft] (EEMCS)-Delft University of Technology (TU Delft), Department of Mathematics and Statistics [Ottawa], University of Ottawa [Ottawa], Institut de Biologie Computationnelle (IBC), Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Institut des Sciences de l'Evolution de Montpellier (UMR ISEM), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre National de la Recherche Scientifique (CNRS)-Institut de recherche pour le développement [IRD] : UR226, CNRS PICS 230310 (CoCoAlSeq), NWO Vidi 639.072.602, NSERC PDF Grant, ANR-10-BINF-0001,ANCESTROME,Approche de phylogénie intégrative pour la reconstruction de génomes ancestraux(2010), European Project: 634650,H2020,H2020-PHC-2014-two-stage,VIROGENESIS(2015), Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École Pratique des Hautes Études (EPHE), Laboratoire d'Informatique Gaspard-Monge (ligm), University of Ottawa [Ottawa] (uOttawa), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), and ANR-10-BINF-01-01/10-BINF-0001,ANCESTROME,ANCESTROME(2010)
Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network—that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose “horizontal” moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and “vertical” moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves—named rNNI and rSPR—reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results—separating the contributions of horizontal and vertical moves—we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction., Author summary Phylogenetic networks are used to represent reticulate evolution, that is, cases in which the tree-of-life metaphor for evolution breaks down, because some of its branches have merged at one or several points in the past. This may occur, for example, when some organisms in the phylogeny are hybrids. In this paper, we deal with an elementary question for the reconstruction of phylogenetic networks: how to explore the space of all possible networks. The fundamental component for this is the set of operations that should be employed to generate alternative hypotheses for what happened in the past—which serve as basic blocks for optimization techniques such as hill-climbing. Although these approaches have a long tradition in classic tree-based phylogenetics, their application to networks that explicitly represent reticulate evolution is relatively unexplored. This paper provides the fundamental definitions and theoretical results for subsequent work in practical methods for phylogenetic network reconstruction: we subdivide networks into layers, according to a generally-accepted measure of their complexity, and provide operations that allow both to fully explore each layer, and to move across different layers. These operations constitute natural generalizations of well-known operations for the exploration of the space of phylogenetic trees, the lowest layer in the hierarchy described above.