Back to Search
Start Over
A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
- Source :
- PLoS ONE, PLoS ONE, Vol 16, Iss 10, p e0258326 (2021)
- Publication Year :
- 2020
-
Abstract
- Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.
- Subjects :
- Data Analysis
Computer and Information Sciences
Computer science
Science
Kernel Functions
Gene Expression
Research and Analysis Methods
Kernel principal component analysis
Machine Learning
Naive Bayes classifier
Machine Learning Algorithms
Kernel Methods
Mathematical and Statistical Techniques
Artificial Intelligence
Genetics
Medicine and Health Sciences
Humans
Statistical Methods
Operator Theory
Principal Component Analysis
Multidisciplinary
Dimensionality reduction
Applied Mathematics
Simulation and Modeling
Statistics
Biology and Life Sciences
Cancers and Neoplasms
Eigenvalues
Random forest
Support vector machine
Algebra
Gene Expression Regulation
Linear Algebra
Oncology
Dimensional reduction
Kernel (statistics)
Area Under Curve
Principal component analysis
Physical Sciences
Multivariate Analysis
Medicine
Algorithm
Algorithms
Mathematics
Research Article
Subjects
Details
- ISSN :
- 19326203
- Volume :
- 16
- Issue :
- 10
- Database :
- OpenAIRE
- Journal :
- PloS one
- Accession number :
- edsair.doi.dedup.....ed42e3e43c9a1ff913489c6e67a73031