Back to Search
Start Over
vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
- Source :
- Human Genomics, Human Genomics, Vol 13, Iss 1, Pp 1-12 (2019)
- Publication Year :
- 2019
- Publisher :
- Springer Science and Business Media LLC, 2019.
-
Abstract
- Background Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). Results and conclusion We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as “SNP,” “Ins,” “Del,” and “Match”) of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F1 score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. Electronic supplementary material The online version of this article (10.1186/s40246-019-0194-6) contains supplementary material, which is available to authorized users.
- Subjects :
- Linkage disequilibrium
lcsh:QH426-470
Computer science
lcsh:Medicine
SNP
Computational biology
Viterbi algorithm
Polymorphism, Single Nucleotide
Linkage Disequilibrium
03 medical and health sciences
symbols.namesake
INDEL Mutation
Variant calling
Databases, Genetic
Drug Discovery
Genetics
Humans
Leverage (statistics)
HMM
Indel
Hidden Markov model
Molecular Biology
0303 health sciences
lcsh:R
030305 genetics & heredity
Genetic Variation
High-Throughput Nucleotide Sequencing
INDEL
Quantitative Biology::Genomics
Markov Chains
lcsh:Genetics
Identification (information)
Haplotypes
Path (graph theory)
symbols
Molecular Medicine
Primary Research
F1 score
Algorithms
Subjects
Details
- ISSN :
- 14797364
- Volume :
- 13
- Database :
- OpenAIRE
- Journal :
- Human Genomics
- Accession number :
- edsair.doi.dedup.....c86d329bf02d4a2617a633a4756600c5