51. A novel method for data fusion over Entity-Relation graphs and its application to protein-protein interaction prediction
- Author
-
Daniele Raimondi, Jaak Simm, Adam Arany, and Yves Moreau
- Subjects
Statistics and Probability ,Relation (database) ,Computer science ,Cell ,02 engineering and technology ,Machine learning ,computer.software_genre ,Biochemistry ,Matrix decomposition ,Protein–protein interaction ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Protein function ,business.industry ,Scale (chemistry) ,Sensor fusion ,Computer Science Applications ,Computational Mathematics ,medicine.anatomical_structure ,Computational Theory and Mathematics ,Proteome ,020201 artificial intelligence & image processing ,Protein–protein interaction prediction ,Artificial intelligence ,State (computer science) ,business ,computer - Abstract
Motivation Modern bioinformatics is facing increasingly complex problems to solve, and we are indeed rapidly approaching an era in which the ability to seamlessly integrate heterogeneous sources of information will be crucial for the scientific progress. Here, we present a novel non-linear data fusion framework that generalizes the conventional matrix factorization paradigm allowing inference over arbitrary entity-relation graphs, and we applied it to the prediction of protein–protein interactions (PPIs). Improving our knowledge of PPI networks at the proteome scale is indeed crucial to understand protein function, physiological and disease states and cell life in general. Results We devised three data fusion-based models for the proteome-level prediction of PPIs, and we show that our method outperforms state of the art approaches on common benchmarks. Moreover, we investigate its predictions on newly published PPIs, showing that this new data has a clear shift in its underlying distributions and we thus train and test our models on this extended dataset. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020