1. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
- Author
-
Nielsen, H.B., Almeida, M., Sierakowska Juncker, A., Rasmussen, S., Li, J., Sunagawa, S., Plichta, D.R., Gautier, L., Pedersen, A.G., Le Chatelier, E., Pelletier, E., Bonde, I., Nielsen, T., Manichanh, C., Arumugam, M., Batto, J.M., Quintanilha dos Santos, M.B., Blom, N., Borruel, N., Burgdorf, K.S., Boumezbeur, F., Casellas, F., Doré, J., Dworzynski, P., Guarner, F., Hansen, T., Hildebrand, F., Kaas, R.S., Kennedy, S., Kristiansen, K., Kultima, J.R., Leonard, P., Levenez, F., Lund, O., Moumen, B., Le Paslier, D., Pons, N., Pedersen, O., Prifti, E., Qin, J., Raes, J., Sørensen, S., Tap, J., Tims, S., Ussery, D.W., Yamada, T., Jamet, A., Mérieux, A., Cultrone, A., Torrejon, A., Quinquis, B., Brechot, C., Delorme, C., M'Rini, C., de Vos, W.M., Maguin, E., Varela, E., Guedon, E., Gwen, F., Haimet, F., Artiguenave, F., Vandemeulebrouck, G., Denariaz, G., Khaci, G., Blottière, H., Knol, J., Weissenbach, J., van Hylckama Vlieg, J.E., Torben, J., Parkhil, J., Turner, K., van de Guchte, M., Antolin, M., Rescigno, M., Kleerebezem, M., Derrien, M., Galleron, N., Sanchez, N., Grarup, N., Veiga, P., Oozeer, R., Dervyn, R., Layec, S., Bruls, T., Winogradski, Y., Zoetendal, E.G., Renault, D., Sicheritz-Ponten, Bork, P., Wang, J., Brunak, S., Ehrlich, S.D., Center for Biological Sequence Analysis, Technical University of Denmark [Lyngby] (DTU), Novo Nordisk Foundation Center for Biosustainability, MICrobiologie de l'ALImentation au Service de la Santé (MICALIS), Institut National de la Recherche Agronomique (INRA)-AgroParisTech, Department of Computer Science [Baltimore], Johns Hopkins University (JHU), BGI Hong Kong Researche Institute, BGI Shenzhen, School of Bioscience and Biotechnology, Southern University of Science and Technology [Shenzhen] (SUSTech), European Molecular Biology Laboratory, US 1367 MetaGénoPolis, Institut National de la Recherche Agronomique (INRA)-Département Microbiologie et Chaîne Alimentaire (MICA), Institut National de la Recherche Agronomique (INRA)-MetaGénoPolis (MGP), Genoscope - Centre national de séquençage [Evry] (GENOSCOPE), Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Université d'Évry-Val-d'Essonne (UEVE), Novo Nordisk Foundation Center for Basic Metabolic Research (CBMR), Faculty of Health and Medical Sciences, University of Copenhagen = Københavns Universitet (KU)-University of Copenhagen = Københavns Universitet (KU), Digestive System Research Unit, Vall d'Hebron University Hospital [Barcelona], Faculty of Health Sciences, University of Southern Denmark (SDU), Department of Structural Biology, Flanders Institute for Biotechnology, Department of Bioscience Engineering, Vrije Universiteit [Brussels] (VUB), 8National Food Institute - Division for Epidemiology and Microbial Genomics, Department of Biology [Copenhagen], Faculty of Science [Copenhagen], Hagedorn Research Institute, Faculty of Health, Aarhus University [Aarhus], BGI Hong Kong research Institute, Rega Institute - Department of Microbiology and Immunology, Université Catholique de Louvain (UCL), VIB Center for the Biology of Disease, Section of Microbiology [Copenhagen], University of Copenhagen = Københavns Universitet (KU)-University of Copenhagen = Københavns Universitet (KU)-Faculty of Science [Copenhagen], Laboratory of Microbiology, Wageningen University and Research Centre [Wageningen] (WUR), Department of Biological Information, Tokyo Institute of Technology [Tokyo] (TITECH), Max-Delbrück Center for Molecular Medicine, Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Centre for Host-Microbiome Interactions, Dental Institute Central Office, Guy’s Hospital, King‘s College London, Département Microbiologie et Chaîne Alimentaire (MICA), Institut National de la Recherche Agronomique (INRA), European Community's Seventh Framework Programme [FP7-HEALTH-F4-2007-201052, FP7-HEALTH-2010-261376], OpenGPU FUI collaborative research projects, DGCIS, Instituto de Salud Carlos III (Spain), Ministere de la Recherche et de l'Education Nationale (France), [ANR-11-DPBS-0001], Danmarks Tekniske Universitet = Technical University of Denmark (DTU), Beijing Genomics Institute [Shenzhen] (BGI), Southern University of Science and Technology (SUSTech), MetaGenoPolis, Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), University of Copenhagen = Københavns Universitet (UCPH)-University of Copenhagen = Københavns Universitet (UCPH), Vrije Universiteit Brussel (VUB), Université Catholique de Louvain = Catholic University of Louvain (UCL), University of Copenhagen = Københavns Universitet (UCPH)-University of Copenhagen = Københavns Universitet (UCPH)-Faculty of Science [Copenhagen], Wageningen University and Research [Wageningen] (WUR), Max Delbrück Center for Molecular Medicine [Berlin] (MDC), Helmholtz-Gemeinschaft = Helmholtz Association, European Project: 201052,EC:FP7:HEALTH,FP7-HEALTH-2007-A,METAHIT(2008), Department of Systems Biology, Center for Biological Sequence Analysis, Ctr Biol Sequence Anal, National University of Singapore (NUS), European Molecular Biology Laboratory [Heidelberg] (EMBL), Department of Mathematics and Computer Science [Odense] (IMADA), Génomique métabolique (UMR 8030), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université d'Évry-Val-d'Essonne (UEVE)-Centre National de la Recherche Scientifique (CNRS), Vall d’Hebron Research Institute (VHIR), Faculty of Health and Medical Sciences, The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen = Københavns Universitet (KU), INRA US1367 MetaGenoPolis, European Molecular Biology Laboratory [Grenoble] (EMBL), Unité de Recherche sur les Maladies Cardiovasculaires, du Métabolisme et de la Nutrition = Institute of cardiometabolism and nutrition (ICAN), Université Pierre et Marie Curie - Paris 6 (UPMC)-Assistance publique - Hôpitaux de Paris (AP-HP) (APHP)-Institut National de la Santé et de la Recherche Médicale (INSERM)-CHU Pitié-Salpêtrière [APHP], Center for Biological Sequence Analysis [Lyngby], Chinese Academy of Agricultural Mechanization Sciences (CCCME), 1Génétique Microbienne, INRA, Domaine de Vilvert, 78352 Jouy en Josas Cedex, and Department of Bio-engineering Sciences
- Subjects
Cellular immunity ,polypeptide ,[SDV]Life Sciences [q-bio] ,SHORT READ ALIGNMENT SEQUENCES SYSTEMS ALGORITHMS MICROBIOTA PROTEIN LIFE SETS TREE TOOL ,complex metagenomic sample ,Applied Microbiology and Biotechnology ,Genome ,Microbiologie ,Databases, Genetic ,genetic element ,Cluster Analysis ,sets ,short read alignment ,ComputingMilieux_MISCELLANEOUS ,Genetics ,0303 health sciences ,tool ,metagenomic ,tree ,Lactococcus lactis ,IL-12 ,Molecular Medicine ,Biotechnology ,life ,Microbial Genomes ,antigen specific immune response ,Biomedical Engineering ,Bioengineering ,Computational biology ,[SDV.BID]Life Sciences [q-bio]/Biodiversity ,cellular immunity ,Biology ,algorithms ,Microbiology ,03 medical and health sciences ,Genetic variation ,microbiota ,Microbiome ,Gene ,genome ,030304 developmental biology ,adjuvant activity ,VLAG ,030306 microbiology ,Metagenomics ,WIAS ,Microbial genetics ,sequences ,systems ,protein - Abstract
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
- Published
- 2014