1. Network embedding unveils the hidden interactions in the mammalian virome
- Author
-
Timothée Poisot, Marie-Andrée Ouellet, Nardus Mollentze, Maxwell J. Farrell, Daniel J. Becker, Liam Brierley, Gregory F. Albery, Rory J. Gibb, Stephanie N. Seifert, and Colin J. Carlson
- Subjects
DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problems ,Computer software ,QA76.75-76.765 - Abstract
Summary: Predicting host-virus interactions is fundamentally a network science problem. We develop a method for bipartite network prediction that combines a recommender system (linear filtering) with an imputation algorithm based on low-rank graph embedding. We test this method by applying it to a global database of mammal-virus interactions and thus show that it makes biologically plausible predictions that are robust to data biases. We find that the mammalian virome is under-characterized anywhere in the world. We suggest that future virus discovery efforts could prioritize the Amazon Basin (for its unique coevolutionary assemblages) and sub-Saharan Africa (for its poorly characterized zoonotic reservoirs). Graph embedding of the imputed network improves predictions of human infection from viral genome features, providing a shortlist of priorities for laboratory studies and surveillance. Overall, our study indicates that the global structure of the mammal-virus network contains a large amount of information that is recoverable, and this provides new insights into fundamental biology and disease emergence. The bigger picture: Documenting all interactions between viruses and mammals is not feasible; viruses are too small, the world is too big, and viruses and mammals are too diverse. As a consequence, we think we only know about 1% or 2% of the interactions between mammals and viruses. This is a critical gap in our knowledge because it can lead us to missing reservoirs of possible zoonotic viruses. In this article, we develop a process to leverage the information we have about interactions between hosts and viruses to do three things: First, we predict missing interactions in this network and give them a score based on how likely the model guesses they are. Second, we map these predicted interactions in space to provide guidance about where to go and what to look for to collect data that would maximize our knowledge of host-virus interactions. Finally, based on the predicted interactions, we use information about the genome of viruses to identify possible zoonotic viruses.
- Published
- 2023
- Full Text
- View/download PDF