1. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
- Author
-
Dent Earl, Jacob O. Kitzman, Iain MacCallum, James R. Knight, Jacques Corbeil, Elenie Godzaridis, Cristian Del Fabbro, Paul J. Kersey, Martin Hunt, Octávio S. Paulo, Joseph Fass, Isaac Ho, Michael C. Schatz, Erich D. Jarvis, Dominique Lavenier, Simone Scalabrin, Thomas D. Otto, Nicolas Maillet, Siu-Ming Yiu, Timothy I. Shaw, David B. Jaffe, Henry Song, Ruibang Luo, Steve Goldstein, David Haussler, Francisco Pina-Martins, Richard A. Gibbs, Adam M. Phillippy, Michael Bechner, Ganeshkumar Ganapathy, Stephen Richards, Riccardo Vicedomini, Shuangye Yin, François Laviolette, Yingrui Li, T. Roderick Docking, Binghang Liu, Carson Qu, Wen-Chi Chou, Hao Zhang, Nuno A. Fonseca, Dariusz Przybylski, Bruno Vieira, Yue Liu, Matthew D. MacManes, Sébastien Boisvert, Yujian Shi, Jared T. Simpson, Sergey Kazakov, Sergey Koren, Jarrod Chapman, Giles Hall, Paul Baranay, Sante Gnerre, Shiguo Zhou, Rayan Chikhi, Filipe J. Ribeiro, Jason T. Howard, Zhenyu Li, Pavel Fedotov, Jay Shendure, J. Graham Ruby, Joseph B. Hiatt, Benedict Paten, Ian F Korf, David C. Schwartz, Keith Bradnam, Jianying Yuan, Alexey Sergushichev, Jun Wang, Hamidreza Chitsaz, Daniel S. Rokhsar, Inanc Birol, Huaiyang Jiang, Kim C. Worley, Anton Alexandrov, Zemin Ning, Delphine Naquin, Michael Place, Matthias Haimel, Guojie Zhang, Guillaume Chapuis, Fedor Tsarev, Scott J. Emrich, Shaun D. Jackman, Sergey Melnikov, Xiang Qin, Ted Sharpe, Francesco Vezzi, Tak-Wah Lam, Richard Durbin, Genome Center [UC Davis], University of California [Davis] (UC Davis), University of California (UC)-University of California (UC), National Research University of Information Technologies, Mechanics and Optics [St. Petersburg] (ITMO), Computational Biology and Bioinformatics [New Haven], Yale University [New Haven], Laboratory for Molecular and Computational Genomics [Madison], University of Wisconsin-Madison, Genome Sciences Centre [Vancouver] (GSC), British Columbia Cancer Agency, Infectious Diseases Research Center [Québec], Université Laval [Québec] (ULaval), Faculté de médecine de l'Université Laval [Québec] (ULaval), DOE Joint Genome Institute [Walnut Creek], Biological systems and models, bioinformatics and sequences (SYMBIOSE), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Dependability Interoperability and perfOrmance aNalYsiS Of networkS (DIONYSOS), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES (IRISA-D2), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Department of computer science [Detroit], Wayne State University [Detroit], Institute of Bioinformatics [Athens], University of Georgia [USA], Institute of Aging Research [Boston], Hebrew SeniorLife [Boston], Department of Molecular Medicine [Québec], Institute of Applied Genomics [Udine] (IGA), Institute of Applied Genomics, The Wellcome Trust Sanger Institute [Cambridge], Howard Hughes Medical Institute [Santa Cruz] (HHMI), Howard Hughes Medical Institute (HHMI)-University of California [Santa Cruz] (UC Santa Cruz), Department of Computer Science and Engineering [South Bend], University of Notre Dame [Indiana] (UND), European Bioinformatics Institute [Hinxton] (EMBL-EBI), EMBL Heidelberg, CRACS & Inesc TEC [Porto], Universidade do Porto = University of Porto, Medical Center [Durham], Duke University [Durham], Human Genome Sequencing Center [Houston] (HGSC), Baylor College of Medicine (BCM), Baylor University-Baylor University, Broad Institute [Cambridge], Harvard University-Massachusetts Institute of Technology (MIT), Faculty of Medicine, Department of Genome Sciences [Seattle] (GS), University of Washington [Seattle], 454 Life Sciences [Branford], 454 Life Sciences, National Biodefense Analysis and Countermeasures Center [Frederick], U.S. Social Security Administration, Center for Bioinformatics and Computational Biology [Maryland] (CBCB), University of Maryland [College Park], University of Maryland System-University of Maryland System, HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory [Hong Kong], The University of Hong Kong (HKU), Invariant Preserving SOlvers (IPSO), Institut de Recherche Mathématique de Rennes (IRMAR), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-INSTITUT AGRO Agrocampus Ouest, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Computer Science and Software Engineering [Québec], Beijing Genomics Institute [Shenzhen] (BGI), Berkeley California Institute for Quantitative Biosciences [Berkeley], University of California (UC), Computational Biology & Population Genomics Group [Lisboa], Centre for Environmental Biology, New York Genome Center [New York], New York Genome Center, Department of Molecular and Cell Biology, Department of Biochemistry and Biophysics, Howard Hughes Medical Institute (HHMI), Simons Center for Quantitative Biology [Cold Spring Harbor], Cold Spring Harbor Laboratory, Department of Epidemiology and Biostatistics [Athens], University of Georgia [USA]-College of Public Health, Science for Life Laboratory [Solna], Royal Institute of Technology [Stockholm] (KTH ), Department of Mathematics and Computer Science [Udine], Università degli Studi di Udine - University of Udine [Italie], University of California-University of California, Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), Howard Hughes Medical Institute (HHMI)-University of California [Santa Cruz] (UCSC), Universidade do Porto, Harvard University [Cambridge]-Massachusetts Institute of Technology (MIT), AGROCAMPUS OUEST, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Université de Rennes 2 (UR2), Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-AGROCAMPUS OUEST, Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Inria Rennes – Bretagne Atlantique, and University of California
- Subjects
[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] ,Computer science ,Sequence assembly ,GENOMES ,Health Informatics ,Computational biology ,Assessment ,COMPASS ,Genome ,03 medical and health sciences ,0302 clinical medicine ,biology.animal ,Quantitative Biology - Genomics ,Gene ,030304 developmental biology ,Sequence (medicine) ,Scaffolds ,Genomics (q-bio.GN) ,Whole genome sequencing ,0303 health sciences ,Genome assembly ,Heterozygosity ,biology ,N50 ,Research ,Vertebrate ,De novo Assembly ,Computer Science Applications ,Fosmid ,FOS: Biological sciences ,Scalability ,030217 neurology & neurosurgery - Abstract
Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another., Comment: Additional files available at http://korflab.ucdavis.edu/Datasets/Assemblathon/Assemblathon2/Additional_files/ Major changes 1. Accessions for the 3 read data sets have now been included 2. New file: spreadsheet containing details of all Study, Sample, Run, & Experiment identifiers 3. Made miscellaneous changes to address reviewers comments. DOIs added to GigaDB datasets
- Published
- 2013
- Full Text
- View/download PDF