Back to Search Start Over

hg19KIndel: ethnicity normalized human reference genome

Authors :
Harsh G. Shukla
Pushpinder Bawa
Subhashini Srinivasan
Source :
BMC Genomics, BMC Genomics, Vol 20, Iss 1, Pp 1-17 (2019)
Publication Year :
2019
Publisher :
BioMed Central, 2019.

Abstract

Background The most widely used human genome reference assembly hg19 harbors minor alleles at 2.18 million positions as revealed by 1000 Genome Phase 3 dataset. Although this is less than 2% of the 89 million variants reported, it has been shown that the minor alleles can result in 30% false positives in individual genomes, thus misleading and burdening downstream interpretation. More alarming is the fact that, significant percentage of variants that are homozygous recessive for these minor alleles, with potential disease implications, are masked from reporting. Results We have demonstrated that the false positives (FP) and false negatives (FN) can be corrected for by simply replacing nucleotides at the minor allele positions in hg19 with corresponding major allele. Here, we have effectively replaced 2.18 million minor alleles Single Nucleotide Polymorphism (SNPs), Insertion and Deletions (INDELs), Multiple Nucleotide Polymorphism (MNPs) in hg19 with the corresponding major alleles to create an ethnically normalized reference genome called hg19KIndel. In doing so, hg19KIndel has both corrected for sequencing errors acknowledged to be present in hg19 and has improved read alignment near the minor alleles in hg19. Conclusion We have created and made available a new version human reference genome called hg19KIndel. It has been shown that variant calling using hg19KIndel, significantly reduces false positives calls, which in-turn reduces the burden from downstream analysis and validation. It also improved false negative variants call, which means that the variants which were getting missed due to the presence of minor alleles in hg19, will now be called using hg19KIndel. Using hg19KIndel, one even gets a better mapping percentage when compared to currently available human reference genome. hg19KIndel reference genome and its auxiliary datasets are available at https://doi.org/10.5281/zenodo.2638113

Details

Language :
English
ISSN :
14712164
Volume :
20
Database :
OpenAIRE
Journal :
BMC Genomics
Accession number :
edsair.doi.dedup.....f046595906c5acc8c013db20780f9a0a