Back to Search Start Over

Deep Denoising of Raw Biomedical Knowledge Graph from COVID-19 Literature, LitCovid and Pubtator.

Authors :
Jiang, Chao
Ngo, Victoria
Chapman, Richard
Yu, Yue
Liu, Hongfang
Jiang, Guoqian
Zong, Nansu
Source :
Journal of Medical Internet Research; Jul2022, Vol. 24 Issue 7, pN.PAG-N.PAG, 1p, 10 Color Photographs, 1 Chart
Publication Year :
2022

Abstract

<bold>Background: </bold>The multiple types of biomedical associations of the knowledge graphs, including the COVID-19-related ones, are constructed based upon the co-occurring biomedical entities retrieved from recent literature. However, the applications dervived from these raw graphs (e.g., association predictions amongst genes, drugs, and diseases) have a high probability of false-positive predictions as the co-occurrences in literature do not always mean a true biomedical association between two entities.<bold>Objective: </bold>Data quality plays an important role in training deep neural network models, however, most of the current work in this area have been focused on improving a model's performance with the assumption that the pre-processed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information.<bold>Methods: </bold>The proposed framework utilized generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two Generative Adversarial Network models, NetGAN and CELL, were adopted for the edge classification (i.e., link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator.<bold>Results: </bold>The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the promised method still achieved favorable results (AUCROC > 0.8 for synthetic and 0.7 for real dataset) despite the limited amount of testing data available.<bold>Conclusions: </bold>Our preliminary findings showed the proposed framework achieved promising results for removing noise in data preprocessing of the biomedical knowledge graph potentially improving the performance of downstream applications by providing cleaner data.<bold>Clinicaltrial: </bold> [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14394456
Volume :
24
Issue :
7
Database :
Supplemental Index
Journal :
Journal of Medical Internet Research
Publication Type :
Academic Journal
Accession number :
158333645
Full Text :
https://doi.org/10.2196/38584