Back to Search
Start Over
UMI-Gen: a UMI-based reads simulator for variant calling evaluation in paired-end sequencing NGS libraries
- Publication Year :
- 2019
- Publisher :
- Cold Spring Harbor Laboratory, 2019.
-
Abstract
- Next Generation Sequencing (NGS) has become the go-to standard method for the detection of Single Nucleotide Variants (SNV) in tumor cells. The use of such technologies requires a PCR amplification step and a sequencing step, steps in which artifacts are introduced at very low frequencies. These artifacts are often confused with true low-frequency variants that can be found in tumor cells and cell-free DNA. The recent use of Unique Molecular Identifiers (UMI) in targeted sequencing protocols has offered a trustworthy approach to filter out artifactual variants and accurately call low frequency variants. However, the integration of UMI analysis in the variant calling process led to developing tools that are significantly slower and more memory consuming than raw-reads-based variant callers. We present UMI-VarCal, a UMI-based variant caller for targeted sequencing data with better sensitivity compared to other variant callers. Being developed with performance in mind, UMI-VarCal stands out from the crowd by being one of the few variant callers that don9t rely on SAMtools to do their pileup. Instead, at its core runs an innovative homemade pileup algorithm specifically designed to treat the UMI tags in the reads. After the pileup, a Poisson statistical test is applied at every position to determine if the frequency of the variant is significantly higher than the background error noise. Finally, an analysis of UMI tags is performed, a strand bias and a homopolymer length filter are applied to achieve better accuracy. We illustrate the results obtained using UMI-VarCal through the sequencing of tumor samples and we show how UMI-VarCal is both faster and more sensitive than other publicly available solutions.
- Subjects :
- chemistry.chemical_classification
0303 health sciences
Computer science
business.industry
Pattern recognition
Filter (signal processing)
DNA sequencing
law.invention
03 medical and health sciences
chemistry.chemical_compound
0302 clinical medicine
chemistry
law
030220 oncology & carcinogenesis
Nucleotide
Noise (video)
Artificial intelligence
business
Polymerase chain reaction
Paired-end tag
DNA
030304 developmental biology
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi...........c30c75191fdbf842747b2133bdcac9cf