1. An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations
- Author
-
Gemy Kaithakottil, Lawrence Percival-Alwyn, Matthew D. Clark, Jonathan M. Wright, Dan Bolser, Heidrun Gundlach, Luca Venturini, Arnaud Kerhornou, Tom Barker, Darren Heavens, Manuel Spannagl, Philippa Borrill, Robert P. Davey, Ned Peel, Federica Di Palma, James Lipscombe, David Swarbreck, Aurore Coince, Owen Duncan, Georg Haberer, Christian Schudoma, Andrew L. Phillips, Cristobal Uauy, Christine Fosker, A. Harvey Millar, Ksenia V. Krasileva, Neil McKenzie, George Kettleborough, Gonzalo Garcia Accinelli, Dina Raats, Bernardo J. Clavijo, Josua Trösch, Paul J. Kersey, Helen Chapman, Guy Naamati, Michael W. Bevan, Ricardo H. Ramirez-Gonzalez, Goutai Yu, and Fu Hao Lu
- Subjects
0106 biological sciences ,Resource ,0301 basic medicine ,Bioinformatics ,Translocation ,Hybrid genome assembly ,Genomics ,Computational biology ,Biology ,01 natural sciences ,Medical and Health Sciences ,Genome ,Translocation, Genetic ,Contig Mapping ,DNA sequencing ,Polyploidy ,03 medical and health sciences ,Genetic ,Polyploid ,Genetics ,Shotgun Sequence Assembly ,Polymorphism ,Gene ,Triticum ,Genetics (clinical) ,030304 developmental biology ,Plant Proteins ,2. Zero hunger ,0303 health sciences ,Polymorphism, Genetic ,Human Genome ,food and beverages ,Molecular Sequence Annotation ,Genome project ,Plant ,Biological Sciences ,Non-coding RNA ,030104 developmental biology ,Algorithms ,Genome, Plant ,010606 plant biology & botany ,Reference genome ,Biotechnology - Abstract
Advances in genome sequencing and assembly technologies are generating many high quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimised data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents more than 78% of the genome with a scaffold N50 of 88.8kbp that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNAseq and PacBio full-length cDNAs to identify 104,091 high confidence protein-coding genes and 10,156 non-coding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop. [Supplemental material is available for this article.]
- Published
- 2017
- Full Text
- View/download PDF