1. Deciphering the Methylation Landscape in Breast Cancer: Diagnostic and Prognostic Biosignatures through Automated Machine Learning
- Author
-
Maria Panagopoulou, Ioannis Tsamardinos, Ekaterini Chatzaki, Makrina Karaglani, Ioannis Iliopoulos, and Vangelis G. Manolopoulos
- Subjects
0301 basic medicine ,Cancer Research ,Disease ,Biology ,Machine learning ,computer.software_genre ,lcsh:RC254-282 ,Article ,predictive model ,03 medical and health sciences ,0302 clinical medicine ,Breast cancer ,breast cancer ,Transcription (biology) ,medicine ,Breast carcinogenesis ,skin and connective tissue diseases ,Gene ,business.industry ,pathway ,Early disease ,Methylation ,bioinformatics ,medicine.disease ,lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,030104 developmental biology ,machine learning ,Oncology ,030220 oncology & carcinogenesis ,DNA methylation ,methylation ,signature ,transcription ,Artificial intelligence ,business ,computer - Abstract
Simple Summary Breast cancer (BrCa) is characterized by aberrant DNA methylation. We leveraged high-throughput methylation data from BrCa and normal breast tissues and identified 11,176 to 27,786 differentially methylated genes (DMGs) against clinically relevant end-points. Innovative automated machine learning was employed to construct three highly performing signatures for (1) the discrimination of BrCa patients from healthy individuals, (2) the identification of BrCa metastatic disease and (3) the early diagnosis of BrCa. Furthermore, functional analysis revealed that most genes selected in the signatures showed associations to BrCa, with regulation of transcription being the main biological process, the nucleus being the main cellular component and transcription factor activity and sequence-specific DNA binding being the main molecular functions. Overall, revisiting methylome datasets led to three high-performance signatures that are readily available for improving BrCa precision management and significant knowledge mining related to disease pathophysiology. Abstract DNA methylation plays an important role in breast cancer (BrCa) pathogenesis and could contribute to driving its personalized management. We performed a complete bioinformatic analysis in BrCa whole methylome datasets, analyzed using the Illumina methylation 450 bead-chip array. Differential methylation analysis vs. clinical end-points resulted in 11,176 to 27,786 differentially methylated genes (DMGs). Innovative automated machine learning (AutoML) was employed to construct signatures with translational value. Three highly performing and low-feature-number signatures were built: (1) A 5-gene signature discriminating BrCa patients from healthy individuals (area under the curve (AUC): 0.994 (0.982–1.000)). (2) A 3-gene signature identifying BrCa metastatic disease (AUC: 0.986 (0.921–1.000)). (3) Six equivalent 5-gene signatures diagnosing early disease (AUC: 0.973 (0.920–1.000)). Validation in independent patient groups verified performance. Bioinformatic tools for functional analysis and protein interaction prediction were also employed. All protein encoding features included in the signatures were associated with BrCa-related pathways. Functional analysis of DMGs highlighted the regulation of transcription as the main biological process, the nucleus as the main cellular component and transcription factor activity and sequence-specific DNA binding as the main molecular functions. Overall, three high-performance diagnostic/prognostic signatures were built and are readily available for improving BrCa precision management upon prospective clinical validation. Revisiting archived methylomes through novel bioinformatic approaches revealed significant clarifying knowledge for the contribution of gene methylation events in breast carcinogenesis.
- Published
- 2021