Back to Search Start Over

Protein pKaPrediction by Tree-Based Machine Learning

Authors :
Chen, Ada Y.
Lee, Juyong
Damjanovic, Ana
Brooks, Bernard R.
Source :
Journal of Chemical Theory and Computation; April 2022, Vol. 18 Issue: 4 p2673-2686, 14p
Publication Year :
2022

Abstract

Protonation states of ionizable protein residues modulate many essential biological processes. For correct modeling and understanding of these processes, it is crucial to accurately determine their pKavalues. Here, we present four tree-based machine learning models for protein pKaprediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKadatasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKaprediction tool PROPKA and 15% better than the published result from the pKaprediction method DelPhiPKa. The overall root-mean-square error (RMSE) for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys, and Tyr), and 0.63 when considering Asp, Glu, His, and Lys only. We provide pKapredictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKavalues close to the physiological pH.

Details

Language :
English
ISSN :
15499618 and 15499626
Volume :
18
Issue :
4
Database :
Supplemental Index
Journal :
Journal of Chemical Theory and Computation
Publication Type :
Periodical
Accession number :
ejs59163453
Full Text :
https://doi.org/10.1021/acs.jctc.1c01257