Back to Search
Start Over
hg19KIndel: ethnicity normalized human reference genome
- Source :
- BMC Genomics, BMC Genomics, Vol 20, Iss 1, Pp 1-17 (2019)
- Publication Year :
- 2019
- Publisher :
- BioMed Central, 2019.
-
Abstract
- Background The most widely used human genome reference assembly hg19 harbors minor alleles at 2.18 million positions as revealed by 1000 Genome Phase 3 dataset. Although this is less than 2% of the 89 million variants reported, it has been shown that the minor alleles can result in 30% false positives in individual genomes, thus misleading and burdening downstream interpretation. More alarming is the fact that, significant percentage of variants that are homozygous recessive for these minor alleles, with potential disease implications, are masked from reporting. Results We have demonstrated that the false positives (FP) and false negatives (FN) can be corrected for by simply replacing nucleotides at the minor allele positions in hg19 with corresponding major allele. Here, we have effectively replaced 2.18 million minor alleles Single Nucleotide Polymorphism (SNPs), Insertion and Deletions (INDELs), Multiple Nucleotide Polymorphism (MNPs) in hg19 with the corresponding major alleles to create an ethnically normalized reference genome called hg19KIndel. In doing so, hg19KIndel has both corrected for sequencing errors acknowledged to be present in hg19 and has improved read alignment near the minor alleles in hg19. Conclusion We have created and made available a new version human reference genome called hg19KIndel. It has been shown that variant calling using hg19KIndel, significantly reduces false positives calls, which in-turn reduces the burden from downstream analysis and validation. It also improved false negative variants call, which means that the variants which were getting missed due to the presence of minor alleles in hg19, will now be called using hg19KIndel. Using hg19KIndel, one even gets a better mapping percentage when compared to currently available human reference genome. hg19KIndel reference genome and its auxiliary datasets are available at https://doi.org/10.5281/zenodo.2638113
- Subjects :
- 0106 biological sciences
lcsh:QH426-470
lcsh:Biotechnology
Single-nucleotide polymorphism
Biology
Major and minor alleles
01 natural sciences
Genome
Polymorphism, Single Nucleotide
03 medical and health sciences
Disease predisposition
INDEL Mutation
lcsh:TP248.13-248.65
Variant calling
Genetics
False positive paradox
Ethnicity
Humans
Allele
Indel
Alleles
030304 developmental biology
0303 health sciences
Genome, Human
Genetic Variation
Sequence Analysis, DNA
Reference Standards
Minor allele frequency
lcsh:Genetics
Human genome
Human reference genome
Databases, Nucleic Acid
Population study
010606 plant biology & botany
Biotechnology
Reference genome
Research Article
Subjects
Details
- Language :
- English
- ISSN :
- 14712164
- Volume :
- 20
- Database :
- OpenAIRE
- Journal :
- BMC Genomics
- Accession number :
- edsair.doi.dedup.....f046595906c5acc8c013db20780f9a0a