1. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
- Author
-
Jongsoo Keum, Ingoo Lee, and Hojung Nam
- Subjects
0301 basic medicine ,FOS: Computer and information sciences ,Models, Molecular ,Computer Science - Machine Learning ,Computer science ,Protein Extraction ,Ligands ,Convolutional neural network ,Biochemistry ,Quantitative Biology - Quantitative Methods ,Convolution ,Machine Learning (cs.LG) ,Machine Learning ,Database and Informatics Methods ,0302 clinical medicine ,Protein sequencing ,Mathematical and Statistical Techniques ,Protein methods ,Sequence Analysis, Protein ,Drug Discovery ,Medicine and Health Sciences ,Biology (General) ,Quantitative Methods (q-bio.QM) ,Extraction Techniques ,Computational model ,Ecology ,Artificial neural network ,Statistics ,Protein structure prediction ,Enzymes ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Sequence Analysis ,Research Article ,Computer and Information Sciences ,Drug Research and Development ,Neural Networks ,QH301-705.5 ,Bioinformatics ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Deep Learning ,Sequence Motif Analysis ,Artificial Intelligence ,Genetics ,Computer Simulation ,Amino Acid Sequence ,Statistical Methods ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Pharmacology ,Binding Sites ,business.industry ,Deep learning ,Biology and Life Sciences ,Proteins ,Computational Biology ,Pattern recognition ,030104 developmental biology ,FOS: Biological sciences ,Enzymology ,Artificial intelligence ,business ,Mathematical Functions ,Protein Kinases ,030217 neurology & neurosurgery ,Mathematics ,Forecasting ,Neuroscience - Abstract
Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at https://github.com/GIST-CSBL/DeepConv-DTI., Author summary Drugs work by interacting with target proteins to activate or inhibit a target’s biological process. Therefore, identification of DTIs is a crucial step in drug discovery. However, identifying drug candidates via biological assays is very time and cost consuming, which introduces the need for a computational prediction approach for the identification of DTIs. In this work, we constructed a novel DTI prediction model to extract local residue patterns of target protein sequences using a CNN-based deep learning approach. As a result, the detected local features of protein sequences perform better than other protein descriptors for DTI prediction and previous models for predicting PubChem independent test datasets. That is, our approach of capturing local residue patterns with CNN successfully enriches protein features from a raw sequence.
- Published
- 2018