251. PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine
- Author
-
Wenyi Yang, Juan Pan, Hui Liu, Xiaojie Xu, Lei Deng, and Chuyao Liu
- Subjects
DNA-binding residue ,0301 basic medicine ,Support Vector Machine ,Protein Conformation ,Computer science ,0206 medical engineering ,Feature selection ,Incremental feature selection ,02 engineering and technology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Machine Learning ,03 medical and health sciences ,Light gradient boosting ,Structural Biology ,Humans ,AdaBoost ,lcsh:QH301-705.5 ,Molecular Biology ,Binding Sites ,business.industry ,Research ,Applied Mathematics ,Computational Biology ,Pattern recognition ,DNA ,Computer Science Applications ,Random forest ,DNA-Binding Proteins ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,lcsh:Biology (General) ,lcsh:R858-859.7 ,Gradient boosting ,Artificial intelligence ,business ,Algorithms ,020602 bioinformatics ,Protein Binding - Abstract
Background Identifying specific residues for protein-DNA interactions are of considerable importance to better recognize the binding mechanism of protein-DNA complexes. Despite the fact that many computational DNA-binding residue prediction approaches have been developed, there is still significant room for improvement concerning overall performance and availability. Results Here, we present an efficient approach termed PDRLGB that uses a light gradient boosting machine (LightGBM) to predict binding residues in protein-DNA complexes. Initially, we extract a wide variety of 913 sequence and structure features with a sliding window of 11. Then, we apply the random forest algorithm to sort the features in descending order of importance and obtain the optimal subset of features using incremental feature selection. Based on the selected feature set, we use a light gradient boosting machine to build the prediction model for DNA-binding residues. Our PDRLGB method shows better overall predictive accuracy and relatively less training time than other widely used machine learning (ML) methods such as random forest (RF), Adaboost and support vector machine (SVM). We further compare PDRLGB with various existing approaches on the independent test datasets and show improvement in results over the existing state-of-the-art approaches. Conclusions PDRLGB is an efficient approach to predict specific residues for protein-DNA interactions.
- Published
- 2018
- Full Text
- View/download PDF