Back to Search
Start Over
Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method
- Source :
- Journal of Cancer
- Publication Year :
- 2020
- Publisher :
- Ivyspring International Publisher, 2020.
-
Abstract
- Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials and Methods: In this study, LUAD RNA-Seq data and clinical data from the Cancer Genome Atlas (TCGA) were divided into TCGA cohort I (n = 338) and II (n = 168). The cohort I was used for model construction, and the cohort II and data from Gene Expression Omnibus (GSE72094 cohort, n = 393; GSE11969 cohort, n = 149) were utilized for validation. First, the survival-related seed genes were selected from the cohort I using the machine learning model (random survival forest, RSF), and then in order to improve prediction accuracy, the forward selection model was utilized to identify the prognosis-related key genes among the seed genes using the clinically-integrated RNA-Seq data. Second, the survival risk score system was constructed by using these key genes in the cohort II, the GSE72094 cohort and the GSE11969 cohort, and the evaluation metrics such as HR, p value and C-index were calculated to validate the proposed method. Third, the developed approach was compared with the previous five prediction models. Finally, bioinformatics analyses (pathway, heatmap, protein-gene interaction network) have been applied to the identified seed genes and key genes. Results and Conclusion: Based on the RSF model and clinically-integrated RNA-Seq data, we identified sixteen key genes that formed the prognostic gene expression signature. These sixteen key genes could achieve a strong power for prognostic prediction of LUAD patients in cohort II (HR = 3.80, p = 1.63e-06, C-index = 0.656), and were further validated in the GSE72094 cohort (HR = 4.12, p = 1.34e-10, C-index = 0.672) and GSE11969 cohort (HR = 3.87, p = 6.81e-07, C-index = 0.670). The experimental results of three independent validation cohorts showed that compared with the traditional Cox model and the use of standalone RNA-Seq data, the machine-learning-based method effectively improved the prediction accuracy of LUAD prognosis, and the derived model was also superior to the other five existing prediction models. KEGG pathway analysis found eleven of the sixteen genes were associated with Nicotine addiction. Thirteen of the sixteen genes were reported for the first time as the LUAD prognosis-related key genes. In conclusion, we developed a sixteen-gene prognostic marker for LUAD, which may provide a powerful prognostic tool for precision oncology.
- Subjects :
- Lung adenocarcinoma
0301 basic medicine
Key genes
Prognosis prediction
Machine learning
computer.software_genre
RNA-Seq data
Forward selection model
03 medical and health sciences
0302 clinical medicine
medicine
Prognostic biomarker
Gene
Framingham Risk Score
business.industry
Proportional hazards model
Random survival forest
medicine.disease
Nicotine Addiction
030104 developmental biology
Oncology
030220 oncology & carcinogenesis
Cohort
Adenocarcinoma
Artificial intelligence
business
computer
Research Paper
Subjects
Details
- ISSN :
- 18379664
- Volume :
- 11
- Database :
- OpenAIRE
- Journal :
- Journal of Cancer
- Accession number :
- edsair.doi.dedup.....b8f734d04a203b1d4a7ca897f0661ec0
- Full Text :
- https://doi.org/10.7150/jca.34585