Back to Search Start Over

Use Chou's 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting.

Authors :
Zhang, Shengli
Xue, Tian
Source :
Molecular Genetics & Genomics. Nov2020, Vol. 295 Issue 6, p1431-1442. 12p.
Publication Year :
2020

Abstract

DNase I hypersensitive sites (DHSs) are highly sensitive active chromatin regions to DNase I enzymes, which provide the basis for the study of gene transcriptional regulation mechanism and play an important role in the analysis of gene expression regulatory elements. The identification of DHSs has contributed to biomedical research and genome analysis. There are already southern blotting technology and high-throughput sequencing technology to identify DHSs, but these experimental methods are often time-consuming and expensive, thus, novel and powerful computational methods are needed to predict DHSs. It is understood that researchers in related fields have proposed many feasible methods for the identification of DNase I hypersensitive sites. However, the accuracy of these methods is not satisfactory, so it is necessary to use more effective methods to predict DHSs. Therefore, on the basis of previous studies, we design a novel predictor called iDHS-DXG. First of all, we choose three sequence-derived feature representation methods to extract features, including kmer, mismatch and the dinucleotide property matrix based on Moran coefficient. Truncated singular value decomposition is selected for reducing the dimensionality of the benchmark dataset, and the optimal dimension is obtained through the test. Then, synthetic minority over-sampling technique is utilized to balance the positive and negative samples. After that, we introduce extreme gradient boosting ensemble classifier to predict DHSs. Compared with the previous research results, the main performance evaluation metrics of our method have been improved after five-fold cross-validation test. DHSs were identified on two human genome datasets with an accuracy of 90.84% and 91.27% respectively. This result shows that our method is a feasible, effective and competitive tool for the analysis of gene regulatory elements. Our research is helpful for biologists and geneticists to study genome analysis and gene regulation mechanism. Meanwhile, it is also of great significance to the development of human disease and drug design. Furthermore, the datasets and codes of iDHS-DXG can be obtained from the website: http://github.com/Xtian-696/iDHS-DXG/. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
16174615
Volume :
295
Issue :
6
Database :
Academic Search Index
Journal :
Molecular Genetics & Genomics
Publication Type :
Academic Journal
Accession number :
146150545
Full Text :
https://doi.org/10.1007/s00438-020-01711-8