Ivonne Mendez, Maria Mendez-Lago, Alfredo Villasante, Lidiya V. Boldyreva, Joseph W. Carlson, Kenneth H. Wan, Jacqueline E. Schein, Benjamin W. Booth, Reed A. George, Evgeniya N. Andreyeva, Gary H. Karpen, Patrizio Dimitri, Maria Carmela Accardo, A. Bernardo Carvalho, Robert Svirskas, Marco A. Marra, Soo Park, Martin Krzywinski, Elisabetta Damia, Olga V. Demakova, Giovanni Messina, Beatriz de Pablos, Igor F. Zhimulev, Gerald M. Rubin, Roger A. Hoskins, Samuel E. Galle, Susan E. Celniker, Barret D. Pfeiffer, Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro, Conselho Nacional de Desenvolvimento Científico e Tecnológico (Brasil), Fondazione Cenci Bolognetti, University of California, Ministry of Education and Science of the Russian Federation, Ministerio de Economía y Competitividad (España), Hoskins, R, Carlson, J, Wan, K, Park, S, Mendez, I, Galle, S, Booth, B, Pfeiffer, B, George, R, Svirskas, R, Krzywinski, M, Schein, J, Accardo, M, Damia, E, Messina, G, Mendez-Lago, M, De Pablos, B, Demakova, O, Andreyeva, E, Boldyreva, L, Marra, M, Carvalho, A, Dimitri, P, Villasante, A, Zhimulev, I, Rubin, G, Karpen, G, and Celniker, S
Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads., This work was supported by NIH grants P50 HG00750 (G.M.R.), R01 HG00747 (G.H.K.), and R01 HG002673 (S.E.C.) and performed under U.S. Department of Energy Contracts DE-AC0376SF00098 and DE-AC02-05CH11231, University of California. I.F.Z. was supported by grant 13-04-40137 from the Russian Federation; E.N.A. was supported by grant 12-04-00874-a from the Russian Federation; P.D. was supported by a grant from the Instituto Pasteur-Fondazione Cenci Bolognetti; A.V. was supported by Ministerio de Economía y Competitividad grant BFU2011-30295-C02-01; and A.B.C. was supported by NIH grant R01 GM064590 and grants from Fundaçao de Amparo a Pesquisa do Estado do Rio de Janeiro (FAPERJ) and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).