Back to Search Start Over

Uncertainty measurement for single cell RNA-seq data based on class-consistent technology with application to semi-supervised gene selection.

Authors :
Zhang, Qinli
Zhao, Zhengwei
Liu, Fang
Li, Zhaowen
Source :
Applied Soft Computing; Oct2023, Vol. 146, pN.PAG-N.PAG, 1p
Publication Year :
2023

Abstract

Because of the high cost of label collection, people are now faced with a large number of partially labeled gene expression data (p l g e -data). Single cell RNA-seq data (s c r s -data) are a kind of important p l g e -data and reflect the abundance of gene transcript mRNA measured directly or indirectly in cells. For convenience, a decision information system (DIS) based on s c r s -data is called a single cell gene decision space (s c g d -space). Due to the high dimensionality of s c r s -data, feature selection must be done before clustering and classification. The existing feature selection methods based on equivalence relation are ineffective for the s c g d -space owing to the strictness of equality between information values. To solve the above problems, this paper studies the uncertainty measurement of the s c g d -space based on class-consistent technology and considers its application to semi-supervised gene selection. Class-consistent technology replaces equality with approximate equality between two expression values at a gene. Based on the proposed technology, class-consistent and non-class-consistent relations on the cell set of the s c g d -space are established first. Then, the s c g d -space (O , A , d) is divided into labeled space (O l , A , d) and unlabeled space (O u , A , d). Next, four metrics of importance on each gene subset of (O , A , d) are defined. They are the weighted sum of (O l , A , d) and (O u , A , d) determined by the missing rate of labels and the established relations and can be used to measure the uncertainty of (O , A , d). In addition, as an application of four metrics to the s c g d -space, a semi-supervised gene selection algorithm is designed. Finally, the experimental results and statistical tests on 16 large-scale s c r s -data sets show that the defined metrics can effectively measure the uncertainty of the s c g d -space. The designed algorithm with a high reduction rate outperforms some state-of-the-art feature selection algorithms in terms of eight performance evaluation indicators. • We establish a class-consistent relation on the cell set of a s c g d -space based on classconsistent technology. • We define four metrics of importance on each gene subset of a s c g d -space. • We design a semi-supervised gene selection algorithm in a s c g d -space. • Experimental results show that the designed algorithm outperforms some state-ofthe-art feature selection algorithms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15684946
Volume :
146
Database :
Supplemental Index
Journal :
Applied Soft Computing
Publication Type :
Academic Journal
Accession number :
171902101
Full Text :
https://doi.org/10.1016/j.asoc.2023.110645