1. Development of an Analysis Pipeline Characterizing Multiple Hypervariable Regions of 16S rRNA Using Mock Samples
- Author
-
Nancy J. Ames, Natalia I. Chalmers, Andrew J. Oler, Peter J. Munson, Hyungsuk Kim, Jennifer J. Barb, Gwenyth R. Wallen, and Ann K. Cashion
- Subjects
0301 basic medicine ,lcsh:Medicine ,Biochemistry ,RNA, Ribosomal, 16S ,Gene Order ,Genomic library ,lcsh:Science ,Genetics ,Multidisciplinary ,Ecology ,High-Throughput Nucleotide Sequencing ,Genomics ,Amplicon ,Nucleic acids ,Shannon Index ,Ribosomal RNA ,Medical Microbiology ,Research Article ,Cell biology ,Cellular structures and organelles ,Ecological Metrics ,Sequence analysis ,030106 microbiology ,Computational biology ,Microbial Genomics ,Biology ,Microbiology ,03 medical and health sciences ,Operon ,DNA Barcoding, Taxonomic ,Non-coding RNA ,Operons ,Biology and life sciences ,Bacteria ,lcsh:R ,Ecology and Environmental Sciences ,Organisms ,Computational Biology ,Genetic Variation ,Species Diversity ,Ion semiconductor sequencing ,DNA ,Sequence Analysis, DNA ,16S ribosomal RNA ,Genome Analysis ,Genomic Libraries ,Hypervariable region ,030104 developmental biology ,Metagenomics ,Genetic Loci ,RNA ,lcsh:Q ,Microbiome ,Ribosomes - Abstract
Objectives There is much speculation on which hypervariable region provides the highest bacterial specificity in 16S rRNA sequencing. The optimum solution to prevent bias and to obtain a comprehensive view of complex bacterial communities would be to sequence the entire 16S rRNA gene; however, this is not possible with second generation standard library design and short-read next-generation sequencing technology. Methods This paper examines a new process using seven hypervariable or V regions of the 16S rRNA (six amplicons: V2, V3, V4, V6-7, V8, and V9) processed simultaneously on the Ion Torrent Personal Genome Machine (Life Technologies, Grand Island, NY). Four mock samples were amplified using the 16S Ion Metagenomics Kit™ (Life Technologies) and their sequencing data is subjected to a novel analytical pipeline. Results Results are presented at family and genus level. The Kullback-Leibler divergence (DKL), a measure of the departure of the computed from the nominal bacterial distribution in the mock samples, was used to infer which region performed best at the family and genus levels. Three different hypervariable regions, V2, V4, and V6-7, produced the lowest divergence compared to the known mock sample. The V9 region gave the highest (worst) average DKL while the V4 gave the lowest (best) average DKL. In addition to having a high DKL, the V9 region in both the forward and reverse directions performed the worst finding only 17% and 53% of the known family level and 12% and 47% of the genus level bacteria, while results from the forward and reverse V4 region identified all 17 family level bacteria. Conclusions The results of our analysis have shown that our sequencing methods using 6 hypervariable regions of the 16S rRNA and subsequent analysis is valid. This method also allowed for the assessment of how well each of the variable regions might perform simultaneously. Our findings will provide the basis for future work intended to assess microbial abundance at different time points throughout a clinical protocol.
- Published
- 2016