Back to Search
Start Over
Knowledge-based gene expression classification via matrix factorization
- Source :
- Repositório Científico de Acesso Aberto de Portugal, Repositório Científico de Acesso Aberto de Portugal (RCAAP), instacron:RCAAP, Bioinformatics, Bioinformatics 24, 1688-1697 (2008)
- Publication Year :
- 2008
- Publisher :
- Oxford University Press (OUP), 2008.
-
Abstract
- Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks. Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: elmar.lang@biologie.uni-regensburg.de
- Subjects :
- Statistics and Probability
Microarray
Computer science
0206 medical engineering
Feature extraction
Gene Expression
02 engineering and technology
Computational biology
Machine learning
computer.software_genre
Biochemistry
Pattern Recognition, Automated
Non-negative matrix factorization
Matrix decomposition
03 medical and health sciences
Matrix (mathematics)
Discriminative model
Artificial Intelligence
Gene expression
Molecular Biology
Oligonucleotide Array Sequence Analysis
030304 developmental biology
0303 health sciences
business.industry
Gene Expression Profiling
Original Papers
Independent component analysis
Expression (mathematics)
Computer Science Applications
Random forest
Gene expression profiling
Computational Mathematics
Computational Theory and Mathematics
Artificial intelligence
business
computer
Algorithms
020602 bioinformatics
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 24
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....d77d697b1c3e5de1cc0db617b4306a04
- Full Text :
- https://doi.org/10.1093/bioinformatics/btn245