Back to Search Start Over

dpGMM: A Dirichlet Process Gaussian Mixture Model for Copy Number Variation Detection in Low-Coverage Whole-Genome Sequencing Data

Authors :
Yaoyao Li
Junying Zhang
Xiguo Yuan
Junping Li
Source :
IEEE Access, Vol 8, Pp 27973-27985 (2020)
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

Comprehensive identification and cataloging of copy number variation (CNVs) are essential to providing a complete view of human genetic variation and to finding diseased genes. Due to the large-scale sequencing and cost control whole-genome sequencing (WGS) data, low-coverage data is favorably disposed towards CNV identification. However, such low-coverage data is sensitive to noise and sequencing biases, which results in low resolution of CNV detection in past experimental designs for WGS datasets. In this paper, we present a control-free Dirichlet process Gaussian mixture model (dpGMM) based approach, to analyze the read depth (RD) of low-coverage WGS datasets for CNV discovery. First, noise and biases of the RD signals are corrected through the preprocessing step of dpGMM. Then we assume that RD signals across genomic regions follow a Gaussian mixture model (GMM) in which each Gaussian distribution is followed by a copy number state. Without requiring the number of Gaussian distributions, dpGMM builds a Dirichlet process (DP) GMM for RD signals and further uses a DP prior to infer the number of Gaussian models. After that, we apply dpGMM to simulation datasets with different coverages and individual datasets, and compare ours to three widely used RD-based pipelines, CNVnator, GROM-RD, and BIC-seq2. Simulation results demonstrate that our approach, dpGMM, has a high F1 score in both low- and high- coverage sequences. Also, the number of overlaps between CNVs detected in real data by ours and the standard benchmark is twice as much as that detected by other tools such as CNVnator and GROM-RD.

Details

Language :
English
ISSN :
21693536
Volume :
8
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.098aea35d9e04226a11f38f42ac7a977
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2020.2971863