Back to Search
Start Over
The impact of post-alignment processing procedures on whole-exome sequencing data
- Source :
- Genetics and Molecular Biology, Genetics and Molecular Biology v.43 n.4 2020, Sociedade Brasileira de Genética (SBG), instacron:SBG, Genetics and Molecular Biology, Vol 43, Iss 4 (2020), Genetics and Molecular Biology, Volume: 43, Issue: 4, Article number: e20200047, Published: 13 NOV 2020
- Publication Year :
- 2020
- Publisher :
- Sociedade Brasileira de Genética, 2020.
-
Abstract
- The use of post-alignment procedures has been suggested to prevent the identification of false-positives in massive DNA sequencing data. Insertions and deletions are most likely to be misinterpreted by variant calling algorithms. Using known genetic variants as references for post-processing pipelines can minimize mismatches. They allow reads to be correctly realigned and recalibrated, resulting in more parsimonious variant calling. In this work, we aim to investigate the impact of using different sets of common variants as references to facilitate variant calling from whole-exome sequencing data. We selected reference variants from common insertions and deletions available within the 1K Genomes project data and from databases from the Latin American Database of Genetic Variation (LatinGen). We used the Genome Analysis Toolkit to perform post-processing procedures like local realignment, quality recalibration procedures, and variant calling in whole exome samples. We identified an increased number of variants from the call set for all groups when no post-processing procedure was performed. We found that there was a higher concordance rate between variants called using 1K Genomes and LatinGen. Therefore, we believe that the increased number of rare variants identified in the analysis without realignment or quality recalibration indicated that they were likely false-positives.
- Subjects :
- 0106 biological sciences
0301 basic medicine
Concordance
LatinGen
Computational biology
QH426-470
Biology
01 natural sciences
Genome
BIPMed
DNA sequencing
Set (abstract data type)
03 medical and health sciences
Identification (information)
Genomics and Bioinformatics
variant discovery
030104 developmental biology
Sequence alignment
Genetic variation
Genetics
quality recalibration
Molecular Biology
Exome
Exome sequencing
010606 plant biology & botany
Subjects
Details
- Language :
- English
- ISSN :
- 16784685 and 14154757
- Volume :
- 43
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- Genetics and Molecular Biology
- Accession number :
- edsair.doi.dedup.....0ee16f489ed78527dfc1dd075205620b