Start Over

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

Authors :: Gormley M
Dampier W
Ertel A
Karacali B
Tozeren A
Source :: BMC bioinformatics [BMC Bioinformatics] 2007 Oct 26; Vol. 8, pp. 415. Date of Electronic Publication: 2007 Oct 26.
Publication Year :: 2007
Abstract: Background: Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal) samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a) all genes on the microarray platform and b) a list of known disease-related genes (a priori selection). We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms.<br />Results: Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform.<br />Conclusion: Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine learning approaches. These findings are relevant to the use of molecular profiling for the identification of candidate biomarker panels.

Subjects :: Artificial Intelligence
DNA-Binding Proteins genetics
Decision Support Techniques
Diagnostic Errors
Gene Expression Regulation, Neoplastic
Humans
Neoplasms diagnosis
Neoplasms genetics
Oligonucleotide Array Sequence Analysis
Predictive Value of Tests
Proto-Oncogene Proteins c-bcl-6
ROC Curve
Receptors, Estrogen genetics
Research Design
Biomarkers, Tumor
Databases, Genetic
Gene Expression Profiling methods
Pattern Recognition, Automated methods

Details

Language :: English
ISSN :: 1471-2105
Volume :: 8
Database :: MEDLINE
Journal :: BMC bioinformatics
Publication Type :: Academic Journal
Accession number :: 17963508
Full Text :: https://doi.org/10.1186/1471-2105-8-415

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources