Back to Search
Start Over
Evolving knowledge graph similarity for supervised learning in complex biomedical domains
- Source :
- BMC Bioinformatics, Vol 21, Iss 1, Pp 1-19 (2020), BMC Bioinformatics
- Publication Year :
- 2020
- Publisher :
- BMC, 2020.
-
Abstract
- Background In recent years, biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs. Data mining approaches that work with knowledge graphs have been proposed, but they are based on vector representations that do not capture the full underlying semantics. An alternative is to use machine learning approaches that explore semantic similarity. However, since ontologies can model multiple perspectives, semantic similarity computations for a given learning task need to be fine-tuned to account for this. Obtaining the best combination of semantic similarity aspects for each learning task is not trivial and typically depends on expert knowledge. Results We have developed a novel approach, evoKGsim, that applies Genetic Programming over a set of semantic similarity features, each based on a semantic aspect of the data, to obtain the best combination for a given supervised learning task. The approach was evaluated on several benchmark datasets for protein-protein interaction prediction using the Gene Ontology as the knowledge graph to support semantic similarity, and it outperformed competing strategies, including manually selected combinations of semantic aspects emulating expert knowledge. evoKGsim was also able to learn species-agnostic models with different combinations of species for training and testing, effectively addressing the limitations of predicting protein-protein interactions for species with fewer known interactions. Conclusions evoKGsim can overcome one of the limitations in knowledge graph-based semantic similarity applications: the need to expertly select which aspects should be taken into account for a given application. Applying this methodology to protein-protein interaction prediction proved successful, paving the way to broader applications.
- Subjects :
- Computer science
Knowledge Bases
Genetic programming
02 engineering and technology
Ontology (information science)
Machine learning
computer.software_genre
Semantics
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
Task (project management)
Open Biomedical Ontologies
03 medical and health sciences
Semantic similarity
Structural Biology
Similarity (psychology)
0202 electrical engineering, electronic engineering, information engineering
Data Mining
Humans
Molecular Biology
lcsh:QH301-705.5
030304 developmental biology
Knowledge graph
Protein-protein interaction prediction
0303 health sciences
business.industry
Ontology
Applied Mathematics
Supervised learning
Computer Science Applications
Biological Ontologies
lcsh:Biology (General)
lcsh:R858-859.7
020201 artificial intelligence & image processing
Gene ontology
Supervised Machine Learning
Artificial intelligence
business
computer
Algorithms
Research Article
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Volume :
- 21
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....763a67729b0a19cf81ca5fb964756708