1. GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
- Author
-
Wenying Shan, Lvqi Chen, Hao Xu, Qinghao Zhong, Yinqiu Xu, Hequan Yao, Kejiang Lin, and Xuanyi Li
- Subjects
artificial intelligence ,word2vec ,GcForest ,compound-protein interaction prediction ,small-molecule CD47 inhibitors ,Chemistry ,QD1-999 - Abstract
Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC50s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction.
- Published
- 2023
- Full Text
- View/download PDF