1. Pipeline for amplifying and analyzing amplicons of the V1–V3 region of the 16S rRNA gene.
- Author
-
Allen, Heather K., Bayles, Darrell O., Looft, Torey, Trachsel, Julian, Bass, Benjamin E., Alt, David P., Bearson, Shawn M. D., Nicholson, Tracy, and Casey, Thomas A.
- Subjects
- *
NUCLEIC acids , *RIBOSOMAL RNA , *MICROBIAL ecology , *NUCLEOTIDE sequence , *BIG data - Abstract
Background: Profiling of 16S rRNA gene sequences is an important tool for testing hypotheses in complex microbial communities, and analysis methods must be updated and validated as sequencing technologies advance. In host-associated bacterial communities, the V1-V3 region of the 16S rRNA gene is a valuable region to profile because it provides a useful level of taxonomic resolution; however, use of Illumina MiSeq data for experiments targeting this region needs validation. Results: Using a MiSeq machine and the version 3 (300 × 2) chemistry, we sequenced the V1-V3 region of the 16S rRNA gene within a mock community. Nineteen bacteria and one archaeon comprised the mock community, and 12 replicate amplifications of the community were performed and sequenced. Sequencing the large fragment (490 bp) that encompasses V1-V3 yielded a higher error rate (3.6 %) than has been reported when using smaller fragment sizes. This higher error rate was due to a large number of sequences that occurred only one or two times among all mock community samples. Removing sequences that occurred one time among all samples (singletons) reduced the error rate to 1.4 %. Diversity estimates of the mock community containing all sequences were inflated, whereas estimates following singleton removal more closely reflected the actual mock community membership. A higher percentage of the sequences could be taxonomically assigned after singleton and doubleton sequences were removed, and the assignments reflected the membership of the input DNA. Conclusions: Sequencing the V1-V3 region of the 16S rRNA gene on the MiSeq platform may require additional sequence curation in silico, and improved error rates and diversity estimates show that removing low-frequency sequences is reasonable. When datasets have a high number of singletons, these singletons can be removed from the analysis without losing statistical power while reducing error and improving microbiota assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF