Back to Search
Start Over
Analysis of error profiles in deep next-generation sequencing data
- Source :
- Genome Biology, Genome Biology, Vol 20, Iss 1, Pp 1-15 (2019)
- Publication Year :
- 2019
- Publisher :
- BioMed Central, 2019.
-
Abstract
- Background Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions. Results By evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10−5 to 10−4, which is 10- to 100-fold lower than generally considered achievable (10−3) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10−5 for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10−4 for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to ~ 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.1 ~ 0.01% frequency with the current NGS technology by applying in silico error suppression. Conclusions We present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing. Electronic supplementary material The online version of this article (10.1186/s13059-019-1659-6) contains supplementary material, which is available to authorized users.
- Subjects :
- Quality Control
lcsh:QH426-470
Deep sequencing
Library preparation
In silico
Word error rate
Nucleotide substitution
Computational biology
Biology
Polymerase Chain Reaction
DNA sequencing
03 medical and health sciences
0302 clinical medicine
Neoplasms
Humans
lcsh:QH301-705.5
030304 developmental biology
Sample handling
0303 health sciences
Research
Genetic variants
High-Throughput Nucleotide Sequencing
Hotspot mutation
Sequence Analysis, DNA
Subclonal
lcsh:Genetics
Detection
lcsh:Biology (General)
Case-Control Studies
Mutation
Error rate
Substitution
030217 neurology & neurosurgery
Software
Subjects
Details
- Language :
- English
- ISSN :
- 1474760X and 14747596
- Volume :
- 20
- Database :
- OpenAIRE
- Journal :
- Genome Biology
- Accession number :
- edsair.doi.dedup.....ffedc44bd94c86d80dcef45d103fde6b