1. Cell Subtype Classification via Representation Learning Based on a Denoising Autoencoder for Single-Cell RNA Sequencing
- Author
-
Heejoon Chae, Joungmin Choi, and Je-Keun Rhee
- Subjects
General Computer Science ,Feature extraction ,03 medical and health sciences ,0302 clinical medicine ,scRNA-seq ,Classifier (linguistics) ,Feature (machine learning) ,General Materials Science ,Cluster analysis ,Cell subtype ,030304 developmental biology ,0303 health sciences ,Artificial neural network ,business.industry ,Dimensionality reduction ,General Engineering ,Pattern recognition ,single-cell ,classification ,Softmax function ,gene expression ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,business ,lcsh:TK1-9971 ,Feature learning ,030217 neurology & neurosurgery - Abstract
Identification of single-cell subtypes is one of the fundamental processes required to understand a heterogeneous population composed of multiple cells, based on single-cell RNA sequencing data. Previously, cell subtype identification was mainly carried out by dimension reduction and clustering approaches that grouped cells with similar expressed profiles together. However, for high robustness to noises and systematic annotation of the subtype in each cell, supervised classification approaches have been widely used. Recently, deep neural network (DNN) models have been widely presented in various fields, including biology. By capturing the composite relationship between sample features and target outcomes, a DNN model enables significant performance improvements in biological data mining analyses. In this paper, we constructed a DNN model, called scDAE for single-cell subtype identification combined with representative feature extraction using a multilayer denoising autoencoder (DAE). The feature sets were learned by the DAE and were further tuned by fully connected layers using a softmax classifier. The model was compared against four state-of-the-art cell subtype identification methods and two conventional machine learning algorithms. From multiple tests, scDAE significantly outperformed competing methods especially on data sets having a large number of cell subtypes and noises. Extracted cell features from the proposed model were clearly clustered with respect to subtype. The results of the experiments indicated that our proposed model is effective in identifying single-cell subtypes and molecular signatures representative of each cell subtype. scDAE is publicly available at https://github.com/cbi-bioinfo/scDAE .
- Published
- 2021