Back to Search
Start Over
ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles
- Source :
- BMC Bioinformatics, BMC Bioinformatics, Vol 21, Iss 1, Pp 1-14 (2020)
- Publication Year :
- 2020
- Publisher :
- BioMed Central, 2020.
-
Abstract
- Background Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant variables. Multivariate hypothesis testing holds a non-mainstream position, considering the large computation overhead of large-scale matrix operation. Random forest provides a classification strategy for calculation of variable importance. However, it may be unsuitable for different distributions of samples. Results Based on the thought of using an ensemble classifier, we develop a feature selection tool for differential expression analysis on expression profiles (i.e., ECFS-DEA for short). Considering the differences in sample distribution, a graphical user interface is designed to allow the selection of different base classifiers. Inspired by random forest, a common measure which is applicable to any base classifier is proposed for calculation of variable importance. After an interactive selection of a feature on sorted individual variables, a projection heatmap is presented using k-means clustering. ROC curve is also provided, both of which can intuitively demonstrate the effectiveness of the selected feature. Conclusions Feature selection through ensemble classifiers helps to select important variables and thus is applicable for different sample distributions. Experiments on simulation and realistic data demonstrate the effectiveness of ECFS-DEA for differential expression analysis on expression profiles. The software is available at http://bio-nefu.com/resource/ecfs-dea.
- Subjects :
- Multivariate statistics
Expression profiles
Computer science
Feature selection
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
03 medical and health sciences
0302 clinical medicine
Accumulation
Structural Biology
Cluster analysis
Differential expression analysis
lcsh:QH301-705.5
Molecular Biology
030304 developmental biology
Statistical hypothesis testing
0303 health sciences
business.industry
Applied Mathematics
Gene Expression Profiling
Pattern recognition
Classification
Computer Science Applications
Random forest
lcsh:Biology (General)
Sampling distribution
ROC Curve
030220 oncology & carcinogenesis
lcsh:R858-859.7
Artificial intelligence
business
Classifier (UML)
Software
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Volume :
- 21
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....ca28a2481897b33d32db406e5844d6c2