1. New Genome Similarity Measures based on Conserved Gene Adjacencies
- Author
-
Daniel Doerr, Luis Antonio Brasil Kowada, Eloi Araujo, Shachi Deshpande, Bernard M. E. Moret, Jens Stoye, and Simone Dantas
- Subjects
0301 basic medicine ,0206 medical engineering ,Rearrangement ,Family-Free Genome Comparison ,02 engineering and technology ,Computational biology ,Biology ,Genes, Plant ,Genome ,Duplicate Genes ,Assignment ,Gene Connections ,03 medical and health sciences ,Similarity (network science) ,Genome Similarity Measure ,Conserved Adjacencies ,Gene Order ,Genetics ,Gene family ,Molecular Biology ,Gene ,Genome Rearrangements ,Phylogeny ,Biomedicine ,Comparative genomics ,Models, Genetic ,Phylogenetic tree ,Distance ,business.industry ,Number ,Computational Biology ,Permutations ,Genomics ,Genome project ,Algorithm ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Multigene Family ,Modeling and Simulation ,business ,Algorithms ,Genome, Plant ,020602 bioinformatics - Abstract
Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.
- Published
- 2017