Back to Search Start Over

A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin.

Authors :
De Velasco MA
Sakai K
Mitani S
Kura Y
Minamoto S
Haeno T
Hayashi H
Nishio K
Source :
International journal of clinical oncology [Int J Clin Oncol] 2024 Sep 18. Date of Electronic Publication: 2024 Sep 18.
Publication Year :
2024
Publisher :
Ahead of Print

Abstract

Background: Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.<br />Methods: Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.<br />Results: This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.<br />Conclusions: Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.<br /> (© 2024. The Author(s).)

Details

Language :
English
ISSN :
1437-7772
Database :
MEDLINE
Journal :
International journal of clinical oncology
Publication Type :
Academic Journal
Accession number :
39292320
Full Text :
https://doi.org/10.1007/s10147-024-02617-w