Back to Search Start Over

HSM6AP: a high-precision predictor for the Homo sapiensN6-methyladenosine (m^6 A) based on multiple weights and feature stitching

Authors :
Li, Jing
He, Shida
Guo, Fei
Zou, Quan
Source :
RNA Biology; November 2021, Vol. 18 Issue: 11 p1882-1892, 11p
Publication Year :
2021

Abstract

ABSTRACTRecent studies have shown that RNA methylation modification can affect RNA transcription, metabolism, splicing and stability. In addition, RNA methylation modification has been associated with cancer, obesity and other diseases. Based on information about human genome and machine learning, this paper discusses the effect of the fusion sequence and gene-level feature extraction on the accuracy of methylation site recognition. The significant limitation of existing computing tools was exposed by discovered of new features. (1) Most prediction models are based solely on sequence features and use SVM or random forest as classification methods. (2) Limited by the number of samples, the model may not achieve good performance. In order to establish a better prediction model for methylation sites, we must set specific weighting strategies for training samples and find more powerful and informative feature matrices to establish a comprehensive model. In this paper, we present HSM6AP, a high-precision predictor for the Homo sapiensN6-methyladenosine () based on multiple weights and feature stitching. Compared with existing methods, HSM6AP samples were creatively weighted during training, and a wide range of features were explored. Max-Relevance-Max-Distance (MRMD) is employed for feature selection, and the feature matrix is generated by fusing a single feature. The extreme gradient boosting (XGBoost), an integrated machine learning algorithm based on decision tree, is used for model training and improves model performance through parameter adjustment. Two rigorous independent data sets demonstrated the superiority of HSM6AP in identifying methylation sites. HSM6AP is an advanced predictor that can be directly employed by users (especially non-professional users) to predict methylation sites. Users can access our related tools and data sets at the following website: http://lab.malab.cn/~lijing/HSM6AP.htmlThe codes of our tool can be publicly accessible at https://github.com/lijingtju/HSm6AP.git

Details

Language :
English
ISSN :
15476286 and 15558584
Volume :
18
Issue :
11
Database :
Supplemental Index
Journal :
RNA Biology
Publication Type :
Periodical
Accession number :
ejs58222821
Full Text :
https://doi.org/10.1080/15476286.2021.1875180