Start Over

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification.

Authors :: Sun, Lin
Zhang, Xiaoyu
Qian, Yuhua
Xu, Jiucheng
Zhang, Shiguang
Source :: Information Sciences. Oct2019, Vol. 502, p18-41. 24p.
Publication Year :: 2019
Abstract: Gene expression data classification is an important technology for cancer diagnosis in bioinformatics and has been widely researched. Due to the large number of genes and the small sample size in gene expression data, feature selection based on neighborhood rough sets is a key step for improving the performance of gene expression data classification. However, some quantitative measures of feature sets may be nonmonotonic in neighborhood rough sets, and many feature selection methods based on evaluation functions yield high cardinality and low predictive accuracy. Therefore, investigating effective and efficient heuristic reduction algorithms is necessary. In this paper, a novel feature selection method based on neighborhood rough sets using neighborhood entropy-based uncertainty measures for cancer classification from gene expression data is proposed. First, some neighborhood entropy-based uncertainty measures are investigated for handling the uncertainty and noise of neighborhood decision systems. Then, to fully reflect the decision-making ability of attributes, the neighborhood credibility and neighborhood coverage degrees are defined and introduced into decision neighborhood entropy and mutual information, which are proven to be nonmonotonic. Moreover, some of the properties and relationships among these measures are derived, which is helpful for understanding the essence of the knowledge content and the uncertainty of neighborhood decision systems. Finally, the Fisher score method is employed to preliminarily eliminate irrelevant genes to significantly reduce complexity, and a heuristic feature selection algorithm with low computational complexity is presented to improve the performance of cancer classification using gene expression data. Experiments on ten gene expression datasets show that our proposed algorithm is indeed efficient and outperforms other related methods in terms of the number of selected genes and the classification accuracy, especially as the size of the genes increases. [ABSTRACT FROM AUTHOR]

Subjects :: *CANCER diagnosis
*FEATURE selection
*GENE expression
*BIOINFORMATICS
*COMPUTER algorithms

Details

Language :: English
ISSN :: 00200255
Volume :: 502
Database :: Academic Search Index
Journal :: Information Sciences
Publication Type :: Periodical
Accession number :: 137852031
Full Text :: https://doi.org/10.1016/j.ins.2019.05.072

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources