Back to Search
Start Over
Identifying oncogenes as features for clinical cancer prognosis by Bayesian nonparametric variable selection algorithm
- Source :
- Chemometrics and Intelligent Laboratory Systems. 146:464-471
- Publication Year :
- 2015
- Publisher :
- Elsevier BV, 2015.
-
Abstract
- In clinical research, DNA microarrays are widely applied in the identification of the oncogenes, which are differentially expressed between two clinical states and considered as predictors for the cancer prognosis. Due to the heterogeneity of clinical samples, the differentially expressed genes (DEGs) discovered by current statistical methods or machine learning algorithms involve a number of genes unrelated to the phenotypic differences between the compared samples and, consequently, will impact on the reliability of the predictive models in the cancer prognosis. In our study, we proposed Bayesian nonparametric variable selection algorithm, a stochastic random and hierarchical search method, to separate out the cancer-related genes from the DEG lists. The importance of the genes in the DEG lists can be inferred from the posterior distribution of the predicted clinical endpoints, which can be simulated by the Markov Chain Monte Carlo (MCMC) algorithm. The cancer-related genes were identified according to their importance and used to construct models for the prediction of three clinical endpoints, namely the estrogen receptor status (ER status) of the breast cancer patient, the preoperative treatment response of breast cancer and the overall survival milestone outcome of acute myeloma leukemia (OS of AML). The prediction accuracies of preoperative treatment response, ER status and OS of AML were 86%, 89% and 58%, and the Mathew’s correlation coefficients were 0.42, 0.77 and 0.33, which were higher than those reported in previous studies. Furthermore, most of the genes identified by our method were reported as oncogenes in previous literatures. Our results demonstrated that the Bayesian nonparametric variable selection algorithm proposed in current study can efficiently identify the oncogenes for cancer prognosis and enhance the performance of the predictive models.
- Subjects :
- Process Chemistry and Technology
Posterior probability
Feature selection
Markov chain Monte Carlo
Biology
medicine.disease
Computer Science Applications
Analytical Chemistry
Correlation
symbols.namesake
Breast cancer
medicine
Clinical endpoint
symbols
DNA microarray
Estrogen Receptor Status
Algorithm
Spectroscopy
Software
Subjects
Details
- ISSN :
- 01697439
- Volume :
- 146
- Database :
- OpenAIRE
- Journal :
- Chemometrics and Intelligent Laboratory Systems
- Accession number :
- edsair.doi...........51943d7dbd8b60847b8e38e671732650
- Full Text :
- https://doi.org/10.1016/j.chemolab.2015.07.004