Back to Search
Start Over
Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data
- Source :
- Genomics. 112:1916-1925
- Publication Year :
- 2020
- Publisher :
- Elsevier BV, 2020.
-
Abstract
- This paper presents a Grouping Genetic Algorithm (GGA) to solve a maximally diverse grouping problem. It has been applied for the classification of an unbalanced database of 801 samples of gene expression RNA-Seq data in 5 types of cancer. The samples are composed by 20,531 genes. GGA extracts several groups of genes that achieve high accuracy in multiple classification. Accuracy has been evaluated by an Extreme Learning Machine algorithm and was found to be slightly higher in balanced databases than in unbalanced ones. The final classification decision has been made through a weighted majority vote system between the groups of features. The proposed algorithm finally selects 49 genes to classify samples with an average accuracy of 98.81% and a standard deviation of 0.0174.
- Subjects :
- 0106 biological sciences
0303 health sciences
Cancer classification
RNA-Seq
Feature selection
Biology
01 natural sciences
Standard deviation
Gene Expression Regulation, Neoplastic
03 medical and health sciences
Neoplasms
Genetic algorithm
Gene expression
Genetics
Humans
Multiple classification
Algorithm
Unsupervised Machine Learning
030304 developmental biology
010606 plant biology & botany
Extreme learning machine
Subjects
Details
- ISSN :
- 08887543
- Volume :
- 112
- Database :
- OpenAIRE
- Journal :
- Genomics
- Accession number :
- edsair.doi.dedup.....b04d2619163e9fdec0e2b8fb11959223
- Full Text :
- https://doi.org/10.1016/j.ygeno.2019.11.004