Back to Search
Start Over
Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework
- Source :
- Scientific Reports, Vol 7, Iss 1, Pp 1-15 (2017), Scientific Reports
- Publication Year :
- 2017
- Publisher :
- Springer Science and Business Media LLC, 2017.
-
Abstract
- Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as “mixing strategy”, which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.
- Subjects :
- 0301 basic medicine
Databases, Factual
Computer science
Science
0206 medical engineering
Value (computer science)
Context (language use)
02 engineering and technology
Bioinformatics
computer.software_genre
Article
03 medical and health sciences
Annotation
Protein Annotation
Semantic similarity
Similarity (network science)
Gene
Disease gene
Multidisciplinary
Hierarchy (mathematics)
business.industry
Computational Biology
Proteins
Molecular Sequence Annotation
Gene Ontology
ComputingMethodologies_PATTERNRECOGNITION
030104 developmental biology
Benchmark (computing)
Medicine
Artificial intelligence
business
computer
020602 bioinformatics
Natural language processing
Subjects
Details
- ISSN :
- 20452322
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- Scientific Reports
- Accession number :
- edsair.doi.dedup.....7865a408bf68f36218fe7a496f2bf859
- Full Text :
- https://doi.org/10.1038/s41598-017-00465-5