Back to Search Start Over

Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples

Authors :
00753906
40613328
70807651
Koyama, Takuto
Matsumoto, Shigeyuki
Iwata, Hiroaki
Kojima, Ryosuke
Okuno, Yasushi
00753906
40613328
70807651
Koyama, Takuto
Matsumoto, Shigeyuki
Iwata, Hiroaki
Kojima, Ryosuke
Okuno, Yasushi
Publication Year :
2023

Abstract

Identifying compound-protein interactions (CPIs) is crucial for drug discovery. Since experimentally validating CPIs is often time-consuming and costly, computational approaches are expected to facilitate the process. Rapid growths of available CPI databases have accelerated the development of many machine-learning methods for CPI predictions. However, their performance, particularly their generalizability against external data, often suffers from a data imbalance attributed to the lack of experimentally validated inactive (negative) samples. In this study, we developed a self-training method for augmenting both credible and informative negative samples to improve the performance of models impaired by data imbalances. The constructed model demonstrated higher performance than those constructed with other conventional methods for solving data imbalances, and the improvement was prominent for external datasets. Moreover, examination of the prediction score thresholds for pseudo-labeling during self-training revealed that augmenting the samples with ambiguous prediction scores is beneficial for constructing a model with high generalizability. The present study provides guidelines for improving CPI predictions on real-world data, thus facilitating drug discovery.

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1458649403
Document Type :
Electronic Resource