Back to Search Start Over

Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity

Authors :
Navya Khare
Bishnu Sarker
Marie-Dominique Devignes
Sabeur Aridhi
Computational Algorithms for Protein Structures and Interactions (CAPSID)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
International Institute of Information Technology, Hyderabad [Hyderabad] (IIIT-H)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Source :
IWBBIO 2020-8th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2020-8th International Work-Conference on Bioinformatics and Biomedical Engineering, Sep 2020, GRANADA, Spain. pp.261-272, ⟨10.1007/978-3-030-45385-5_24⟩, Bioinformatics and Biomedical Engineering ISBN: 9783030453848, IWBBIO, IWBBIO 2020-8th International Work-Conference on Bioinformatics and Biomedical Engineering, May 2020, GRANADA, Spain. pp.261-272, ⟨10.1007/978-3-030-45385-5_24⟩
Publication Year :
2020
Publisher :
HAL CCSD, 2020.

Abstract

International audience; Functional annotation of protein is a very challenging task primarily because manual annotation requires a great amount of human efforts and still it’s nearly impossible to keep pace with the exponentially growing number of protein sequences coming into the public databases, thanks to the high throughput sequencing technology. For example, the UniProt Knowledge-base (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. According to the November, 2019 release of UniProtKB, some 561,000 sequences are manually reviewed but over 150 million sequences lack reviewed functional annotations. Moreover, it is an expensive deal in terms of the cost it incurs and the time it takes. On the contrary, exploiting this huge quantity of data is important to understand life at the molecular level, and is central to understanding human disease processes and drug discovery. To be useful, protein sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology(GO) terms. The ability to automatically annotate protein sequences in UniProtKB/TrEMBL, the non-reviewed UniProt sequence repository, would represent a major step towards bridging the gap between annotated and un-annotated protein sequences. In this paper, we extend a neighborhood based network inference technique for automatic GO annotation using protein similarity graph built on protein domain and family information. The underlying philosophy of our approach assumes that proteins can be linked through the domains, families, and superfamilies that they share. We propose an efficient pruning and post-processing technique by integrating semantic similarity of GO terms. We show by empirical results that the proposed hierarchical post-processing potentially improves the performance of other GO annotation tools as well.

Details

Language :
English
ISBN :
978-3-030-45384-8
ISBNs :
9783030453848
Database :
OpenAIRE
Journal :
IWBBIO 2020-8th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2020-8th International Work-Conference on Bioinformatics and Biomedical Engineering, Sep 2020, GRANADA, Spain. pp.261-272, ⟨10.1007/978-3-030-45385-5_24⟩, Bioinformatics and Biomedical Engineering ISBN: 9783030453848, IWBBIO, IWBBIO 2020-8th International Work-Conference on Bioinformatics and Biomedical Engineering, May 2020, GRANADA, Spain. pp.261-272, ⟨10.1007/978-3-030-45385-5_24⟩
Accession number :
edsair.doi.dedup.....d7ba596d72a0e98fee6a3d7025cfbdbe
Full Text :
https://doi.org/10.1007/978-3-030-45385-5_24⟩