Back to Search
Start Over
PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
- Source :
- Frontiers in Research Metrics and Analytics, Frontiers in Research Metrics and Analytics, Vol 6 (2021)
- Publication Year :
- 2021
- Publisher :
- Frontiers Media S.A., 2021.
-
Abstract
- PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 25 million documents. Large quantities of novel literature prevent a single expert from keeping track of all potentially relevant papers, resulting in knowledge gaps. In this article, we present CHEMMESHNET, a newly developed PubMed-based network comprising more than 10,000,000 associations, constructed from expert-curated MeSH annotations of chemicals based on all currently available PubMed articles. By learning latent representations of concepts in the obtained network, we demonstrate in a proof of concept study that purely literature-based representations are sufficient for the reconstruction of a large part of the currently known network of physical, empirically determined protein–protein interactions. We demonstrate that simple linear embeddings of node pairs, when coupled with a neural network–based classifier, reliably reconstruct the existing collection of empirically confirmed protein–protein interactions. Furthermore, we demonstrate how pairs of learned representations can be used to prioritize potentially interesting novel interactions based on the common chemical context. Highly ranked interactions are qualitatively inspected in terms of potential complex formation at the structural level and represent potentially interesting new knowledge. We demonstrate that two protein–protein interactions, prioritized by structure-based approaches, also emerge as probable with regard to the trained machine-learning model.
- Subjects :
- 0301 basic medicine
PubMed
data-mining
Computer science
Context (language use)
02 engineering and technology
Bibliography. Library science. Information resources
Literature-based discovery
03 medical and health sciences
representation learning
Research Metrics and Analytics
0202 electrical engineering, electronic engineering, information engineering
Original Research
Information retrieval
Artificial neural network
Node (networking)
Scale (chemistry)
machine-learning
literature-based discovery
030104 developmental biology
knowledge graphs
Proof of concept
020201 artificial intelligence & image processing
Classifier (UML)
Feature learning
Subjects
Details
- Language :
- English
- ISSN :
- 25040537
- Volume :
- 6
- Database :
- OpenAIRE
- Journal :
- Frontiers in Research Metrics and Analytics
- Accession number :
- edsair.doi.dedup.....f95e0900f165885efde8f7482f8a2b84