1. Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice
- Author
-
Helena Andrés-Terré, Ifrah Tariq, Nikola Simidjievski, Paul Scherer, Cristian Bodnar, Zohreh Shams, Pietro Liò, Mateja Jamnik, Simidjievski, Nikola [0000-0003-3948-6370], Scherer, Paul [0000-0002-2240-7501], Andres Terre, Helena [0000-0001-7199-7897], Shams, Zohreh [0000-0002-0143-798X], Jamnik, Mateja [0000-0003-2772-2532], Lio, Pietro [0000-0002-0540-5053], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,Artificial intelligence ,lcsh:QH426-470 ,Integrative Data Analyses ,Computer science ,Design elements and principles ,Machine learning ,computer.software_genre ,Multi-omic Analysis ,Machine Learning ,Variational Autoencoder ,03 medical and health sciences ,0302 clinical medicine ,Breast cancer ,Deep Learning ,medicine ,Genetics ,Cancer–breast Cancer ,Genetics (clinical) ,Original Research ,Artificial neural network ,business.industry ,Deep learning ,Cancer ,Patient survival ,medicine.disease ,Autoencoder ,Cancer data ,Variety (cybernetics) ,lcsh:Genetics ,030104 developmental biology ,ComputingMethodologies_PATTERNRECOGNITION ,030220 oncology & carcinogenesis ,FOS: Biological sciences ,Molecular Medicine ,Bioinformactics ,business ,computer - Abstract
International initiatives such as the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), Cancer Genome Atlas (TCGA), and the International Cancer Genome Consortium (ICGC) are collecting multiple data sets at different genome-scales with the aim to identify novel cancer bio-markers and predict patient survival. To analyse such data, several machine learning, bioinformatics and statistical methods have been applied, among them neural networks such as autoencoders. Although these models provide a good statistical learning framework to analyse multi-omic and/or clinical data, there is a distinct lack of work on how to integrate diverse patient data and identify the optimal design best suited to the available data. In this paper, we investigate several autoencoder architectures that integrate a variety of cancer patient data types (e.g., multi-omics and clinical data). We perform extensive analyses of these approaches and provide a clear methodological and computational framework for designing systems that enable clinicians to investigate cancer traits and translate the results into clinical applications. We demonstrate how these networks can be designed, built and, in particular, applied to tasks of integrative analyses of heterogeneous breast cancer data. The results show that these approaches yield relevant data representations that, in turn, lead to accurate and stable diagnosis.
- Published
- 2020
- Full Text
- View/download PDF