Back to Search
Start Over
Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
- Source :
- BMC Bioinformatics, Vol 22, Iss 1, Pp 1-11 (2021), BMC Bioinformatics, 22(1), 1. NLM (Medline), BMC Bioinformatics
- Publication Year :
- 2021
-
Abstract
- Background The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the protein clusters in the prokaryotic Virus Orthologous Groups (pVOGs) database, with 83% of its current 9518 pVOGs having an unknown function. Results In this study we describe a machine learning approach to explore potential functional associations between pVOGs. We measure seven genomic features and use them as input to a Random Forest classifier to predict protein–protein interactions between pairs of pVOGs. After systematic evaluation of the model’s performance on 10 different datasets, we obtained a predictor with a mean accuracy of 0.77 and Area Under Receiving Operation Characteristic (AUROC) score of 0.83. Its application to a set of 2,133,027 pVOG-pVOG interactions allowed us to predict 267,265 putative interactions with a reported probability greater than 0.65. At an expected false discovery rate of 0.27, we placed 95.6% of the previously unannotated pVOGs in a functional context, by predicting their interaction with a pVOG that is functionally annotated. Conclusions We believe that this proof-of-concept methodology, wrapped in a reproducible and automated workflow, can represent a significant step towards obtaining a more complete picture of bacteriophage biology.
- Subjects :
- False discovery rate
QH301-705.5
Computer science
Computer applications to medicine. Medical informatics
R858-859.7
Context (language use)
Genome, Viral
Computational biology
Biochemistry
03 medical and health sciences
Annotation
Function prediction
0302 clinical medicine
Structural Biology
Machine learning
Bacteriophages
Biology (General)
Set (psychology)
Molecular Biology
030304 developmental biology
0303 health sciences
Methodology Article
Applied Mathematics
Genomics
Random forest
Computer Science Applications
Workflow
Prokaryotic Cells
Metagenomics
Viruses
DNA microarray
030217 neurology & neurosurgery
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics, Vol 22, Iss 1, Pp 1-11 (2021), BMC Bioinformatics, 22(1), 1. NLM (Medline), BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....e109211e852a8e0c06afe7a50f8f9958