Aurélien J. Doucet, Christophe Hitte, Elzbieta Sliwerska, Bridget Myers, Thomas Derrien, Julia V. Halo, Feichen Shen, Laura E. Kirby, Amanda L. Pendleton, John V. Moran, Adam R. Boyko, Jeffrey M. Kidd, S. Emery, Bowling Green State University (BGSU), University of Michigan [Ann Arbor], University of Michigan System, Institut de Recherche sur le Cancer et le Vieillissement (IRCAN), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Institut de Génétique et Développement de Rennes (IGDR), Structure Fédérative de Recherche en Biologie et Santé de Rennes ( Biosit : Biologie - Santé - Innovation Technologique )-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES), Cornell University [New York], R15GM122028, National Institute of General Medical Sciences, T32HG00040, National Human Genome Research Institute, HAL UR1, Admin, Université Nice Sophia Antipolis (1965 - 2019) (UNS), and Université de Rennes (UR)-Centre National de la Recherche Scientifique (CNRS)-Structure Fédérative de Recherche en Biologie et Santé de Rennes ( Biosit : Biologie - Santé - Innovation Technologique )
Significance Advancements in long-read DNA sequencing technologies provide more comprehensive views of genomes. We used long-read sequences to assemble a Great Dane dog genome that provides several improvements over the existing reference derived from a Boxer. Assembly comparisons revealed that gaps in the Boxer assembly often occur at the beginning of protein-coding genes and have a high-GC content, which likely reflects limitations of previous technologies in resolving GC-rich sequences. Dimorphic LINE-1 and SINEC retrotransposons represent the predominant differences between the Great Dane and Boxer assemblies. Proof-of-principle experiments demonstrated that expression of a canine LINE-1 could promote the retrotransposition of itself and a SINEC_Cf consensus sequence in cultured human cells. Thus, ongoing retrotransposon activity is a major contributor to canine genetic diversity., Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.