1. COMBING: Clustering in Oncology for Mathematical and Biological Identification of Novel Gene Signatures
- Author
-
Battistella, Enzo, Vakalopoulou, Maria, Sun, Roger, Estienne, Théo, Lerousseau, Marvin, Nikolaev, Sergey, Andres, Émilie, Carré, Alexandre, Niyoteka, Stéphane, Robert, Charlotte, Paragios, Nikos, Alvarez Andres, Emilie, Deutsch, Eric, Radiothérapie Moléculaire et Innovation Thérapeutique (RaMo-IT), Institut Gustave Roussy (IGR)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris-Saclay, Centre de vision numérique (CVN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay, Institut Gustave Roussy (IGR), CentraleSupélec, OPtimisation Imagerie et Santé (OPIS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de vision numérique (CVN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-CentraleSupélec-Université Paris-Saclay, Département d'anesthésie [Gustave Roussy], Hopital Saint-Louis [AP-HP] (AP-HP), Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP), Physique médicale, Département de radiothérapie [Gustave Roussy], Institut Gustave Roussy (IGR)-Institut Gustave Roussy (IGR), Département d’Innovation Thérapeutique et essais précoces [Gustave Roussy] (DITEP), and Battistella, Enzo
- Subjects
Computer science ,Applied Mathematics ,Genomics ,Computational biology ,Gene signature ,Precision medicine ,Bottleneck ,Signature (logic) ,Clustering ,Identification (information) ,Multi-tumor association ,Genetics ,Data analysis ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Cluster analysis ,Predictive Signature ,Biomarkers ,Biotechnology ,[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM] - Abstract
International audience; Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a tremendous bottleneck regarding clinical adoption. In this paper, we introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers. Our method is based on the LP-Stability algorithm, a high dimensional centerbased unsupervised clustering algorithm. It offers modularity as concerns metric functions and scalability, while being able to automatically determine the best number of clusters. Our evaluation includes both mathematical and biological criteria to define a quantitative metric. The recovered signature is applied to a variety of biological tasks, including screening of biological pathways and functions, and characterization relevance on tumor types and subtypes. Quantitative comparisons among different distance metrics, commonly used clustering methods and a referential gene signature used in the literature, confirm state of the art performance of our approach. In particular, our signature, based on 27 genes, reports at least 30 times better mathematical significance (average Dunn's Index) and 25% better biological significance (average Enrichment in Protein-Protein Interaction) than those produced by other referential clustering methods. Finally, our signature reports promising results on distinguishing immune inflammatory and immune desert tumors, while reporting a high balanced accuracy of 92% on tumor types classification and averaged balanced accuracy of 68% on tumor subtypes classification, which represents, respectively 7% and 9% higher performance compared to the referential signature.
- Published
- 2021
- Full Text
- View/download PDF