Jean-Philippe Vert, Véronique Stoven, Olivier Collier, Modélisation aléatoire de Paris X (MODAL'X), Université Paris Nanterre (UPN), Centre de Bioinformatique (CBIO), Mines Paris - PSL (École nationale supérieure des mines de Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Institut Curie [Paris], Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), Brain team [Paris], Research at Google, This work was supported the European Research Council https://erc.europa.eu grant ERC-SMAC-280032 (OC, JPV), the Labex MME-DII https://labex-mme-dii.u-cergy.fr ANR11-LBX-0023-01 (OC), and Google LLC (JPV)., ANR-11-LABX-0023,MME-DII,Modèles Mathématiques et Economiques de la Dynamique, de l'Incertitude et des Interactions(2011), European Project: 280032,EC:FP7:ERC,ERC-2011-StG_20101014,SMAC(2012), MINES ParisTech - École nationale supérieure des mines de Paris, PSL Research University (PSL)-MINES ParisTech - École nationale supérieure des mines de Paris, Bodescot, Myriam, Modèles Mathématiques et Economiques de la Dynamique, de l'Incertitude et des Interactions - - MME-DII2011 - ANR-11-LABX-0023 - LABX - VALID, and Statistical machine learning for complex biological data - SMAC - - EC:FP7:ERC2012-02-01 - 2017-01-31 - 280032 - VALID
Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types., Author summary Cancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by specific therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous sources of information into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction software.