Back to Search Start Over

Recall DNA methylation levels at low coverage sites using a CNN model in WGBS.

Authors :
Luo, Ximei
Wang, Yansu
Zou, Quan
Xu, Lei
Source :
PLoS Computational Biology; 6/14/2023, Vol. 19 Issue 6, p1-14, 14p, 1 Chart, 2 Graphs
Publication Year :
2023

Abstract

DNA methylation is an important regulator of gene transcription. WGBS is the gold-standard approach for base-pair resolution quantitative of DNA methylation. It requires high sequencing depth. Many CpG sites with insufficient coverage in the WGBS data, resulting in inaccurate DNA methylation levels of individual sites. Many state-of-arts computation methods were proposed to predict the missing value. However, many methods required either other omics datasets or other cross-sample data. And most of them only predicted the state of DNA methylation. In this study, we proposed the RcWGBS, which can impute the missing (or low coverage) values from the DNA methylation levels on the adjacent sides. Deep learning techniques were employed for the accurate prediction. The WGBS datasets of H1-hESC and GM12878 were down-sampled. The average difference between the DNA methylation level at 12× depth predicted by RcWGBS and that at >50× depth in the H1-hESC and GM2878 cells are less than 0.03 and 0.01, respectively. RcWGBS performed better than METHimpute even though the sequencing depth was as low as 12×. Our work would help to process methylation data of low sequencing depth. It is beneficial for researchers to save sequencing costs and improve data utilization through computational methods. Author summary: DNA methylation has a major impact on gene regulation. WGBS is the gold standard for investigating the DNA methylation. The DNA methylation level of the sites with low coverage are often not accurate in WGBS datasets. Therefore, we proposed a method based on the CNN model to perform DNA methylation level interpolation for specific sites and named this method as RcWGBS. RcWGBS did not rely on other omics data or other cross-sample data. It only used the sites with sufficient coverage contained in the target WGBS dataset for model training to obtain parameters. Then, the trained model can be used to predict the DNA methylation level of sites with low coverage. Our analyses showed that RcWGBS could recalibrate the methylation level of some CpGs with insufficient coverage. It is suggested that our research could benefit the WGBS datasets with insufficient sequencing coverage. RcWGBS is implemented as an R-packages. It is efficient and convenient and does not need other WGBS or omics data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1553734X
Volume :
19
Issue :
6
Database :
Complementary Index
Journal :
PLoS Computational Biology
Publication Type :
Academic Journal
Accession number :
164307915
Full Text :
https://doi.org/10.1371/journal.pcbi.1011205