Back to Search Start Over

Identification of DNA N4-methylcytosine sites based on multi-source features and gradient boosting decision tree.

Authors :
Zhang, Shengli
Yao, Yingying
Wang, Jiesheng
Liang, Yunyun
Source :
Analytical Biochemistry. Sep2022, Vol. 652, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

N4-methylcytosine (4 mC) is an important and common methylation which widely exists in prokaryotes. It plays a crucial role in correcting DNA replication errors and protecting host DNA against degradation by restrictive enzymes. Hence, the accurate identification for 4 mC sites is greatly significant for understanding biological functions and treating gene diseases. In this paper, a novel model is designed for identifying 4 mC sites. Firstly, we extract features from original sequences by multi-source feature representation methods, which are mono-nucleotide binary and k -mer frequency, dinucleotide binary and position-specific frequency, ring-function-hydrogen-chemical properties, dinucleotide-based DNA properties and trinucleotide-based DNA properties. Subsequently, gradient boosting decision tree is applied to select the optimal feature set and remove redundant information. Finally, support vector machine is employed to predict 4 mC or non-4mC sites. The accuracies of six datasets reach 0.851, 0.859, 0.801, 0.87, 0.859 and 0.901, respectively, which are superior to previous prediction methods. Therefore, the results show that our predictor is a feasible and effective tool for identifying 4 mC sites. Furthermore, an online web server is established at http://dnan4c.zhanglab.site. [Display omitted] • A predictor is designed to identify DNA N4-methylcytosine sites based on multi-source features and GBDT. • Extract features from original sequences by multi-source feature representation methods. • GBDT is applied to select the optimal feature set and remove redundant information. • An online web server is established at http://dnan4c.zhanglab.site. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00032697
Volume :
652
Database :
Academic Search Index
Journal :
Analytical Biochemistry
Publication Type :
Academic Journal
Accession number :
157439516
Full Text :
https://doi.org/10.1016/j.ab.2022.114746