1. A New Machine Learning Based Framework to Identify Protein Glycation Sites Using Comprehensive Features and the mRMR Method
- Author
-
Lina Zhang, Chengjin Zhang, Runtao Yang, and Chen Jingui
- Subjects
0301 basic medicine ,Computer science ,Feature extraction ,Peptide ,Feature selection ,Disease ,Machine learning ,computer.software_genre ,ENCODE ,01 natural sciences ,Cross-validation ,Accessible surface area ,Pathogenesis ,03 medical and health sciences ,Glycation ,Diabetes mellitus ,Feature (machine learning) ,Redundancy (engineering) ,medicine ,chemistry.chemical_classification ,business.industry ,010401 analytical chemistry ,medicine.disease ,0104 chemical sciences ,Support vector machine ,030104 developmental biology ,chemistry ,Artificial intelligence ,business ,Protein glycation ,computer - Abstract
Accumulation of the final product during glycation reaction often leads to many diseases, such as diabetes, Alzheimer’s disease and atherosclerosis. Identifying the glycation site can help researchers to understand the pathogenesis and provide new ideas on how to treat these diseases. In this paper, we develop a new predictor by using the support vector machine which apply four feature extractions to encode peptide chains such as binary code sequence, grey incidence degree, accessible surface area and secondary structure probability. The maximum relevancy minimum redundancy (mRMR) feature selection algorithm is used to select the optimal 170 features for the prediction problem. In training set, the performance of Gly-Predict is assessed with an accuracy of 84.815%, a sensitivity of 80.156%, a specificity of 88.868%, and a Matthews’s correlation coefficient (MCC) of 68.411% by k-fold cross validation (k = 5). To objectively evaluate Gly-predict, we tested our model on an independent dataset and compared with previous predictor. The results indicate that Gly-predict is superior to existing glycation site predictors.
- Published
- 2019