Back to Search Start Over

Quality control of microbiota metagenomics by k-mer analysis

Authors :
Plaza Onate, Florian
Batto, Jean-Michel
Juste, Catherine
Fadlallah, Jehane
Fougeroux, Cyrielle
Gouas, Doriane
Pons, Nicolas
Kennedy, Sean
Levenez, Florence
Dore, Joel
Dusko Ehrlich, S.
Gorochov, Guy
Larsen, Martin
INRA US1367 MetaGenoPolis
MICrobiologie de l'ALImentation au Service de la Santé (MICALIS)
Institut National de la Recherche Agronomique (INRA)-AgroParisTech
Service d'immunologie [CHU Pitié-Salpétrière]
Assistance publique - Hôpitaux de Paris (AP-HP) (APHP)-CHU Pitié-Salpêtrière [APHP]
Centre d'Immunologie et de Maladies Infectieuses (CIMI)
Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)
MetaGenoPolis
Institut National de la Recherche Agronomique (INRA)
Service d'Immunologie [CHU Pitié-Salpétrière]
CHU Pitié-Salpêtrière [AP-HP]
Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)-Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)
Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Pierre et Marie Curie - Paris 6 (UPMC)
The study was funded by INSERM, the University Pierre et Marie Curie ËMERGENCE' program, Fondation pour l’Aide a la Recherche sur la Sclerose En Plaques (ARSEP), ARTHRITIS Fondation COURTIN and Agence nationale de la recherché (ANR).
The authors acknowledge the funding agencies and the volunteers providing samples for the study.
Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)
Administateur, HAL Sorbonne Université
Source :
BMC Genomics, BMC Genomics, BioMed Central, 2015, 16 (1), pp.183. ⟨10.1186/s12864-015-1406-7⟩, www.biomedcentral.com/bmcgenomics, BMC Genomics, 2015, 16 (1), pp.183. ⟨10.1186/s12864-015-1406-7⟩, BMC Genomics 1 (16), 183-193. (2015)
Publication Year :
2015
Publisher :
HAL CCSD, 2015.

Abstract

Background The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue. Results We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets. Conclusions We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.

Details

Language :
English
ISSN :
14712164
Database :
OpenAIRE
Journal :
BMC Genomics, BMC Genomics, BioMed Central, 2015, 16 (1), pp.183. ⟨10.1186/s12864-015-1406-7⟩, www.biomedcentral.com/bmcgenomics, BMC Genomics, 2015, 16 (1), pp.183. ⟨10.1186/s12864-015-1406-7⟩, BMC Genomics 1 (16), 183-193. (2015)
Accession number :
edsair.pmid.dedup....29a9a9bff4ecf44a6ad472f3689d70e8