Back to Search Start Over

PredGCN: a Pruning-enabled Gene-Cell Net for automatic cell annotation of single cell transcriptome data.

Authors :
Qi, Qi
Wang, Yunhe
Huang, Yujian
Fan, Yi
Li, Xiangtao
Source :
Bioinformatics. Jul2024, Vol. 40 Issue 7, p1-11. 11p.
Publication Year :
2024

Abstract

Motivation The annotation of cell types from single-cell transcriptomics is essential for understanding the biological identity and functionality of cellular populations. Although manual annotation remains the gold standard, the advent of automatic pipelines has become crucial for scalable, unbiased, and cost-effective annotations. Nonetheless, the effectiveness of these automatic methods, particularly those employing deep learning, significantly depends on the architecture of the classifier and the quality and diversity of the training datasets. Results To address these limitations, we present a Pruning-enabled Gene-Cell Net (PredGCN) incorporating a Coupled Gene-Cell Net (CGCN) to enable representation learning and information storage. PredGCN integrates a Gene Splicing Net (GSN) and a Cell Stratification Net (CSN), employing a pruning operation (PrO) to dynamically tackle the complexity of heterogeneous cell identification. Among them, GSN leverages multiple statistical and hypothesis-driven feature extraction methods to selectively assemble genes with specificity for scRNA-seq data while CSN unifies elements based on diverse region demarcation principles, exploiting the representations from GSN and precise identification from different regional homogeneity perspectives. Furthermore, we develop a multi-objective Pareto pruning operation (Pareto PrO) to expand the dynamic capabilities of CGCN, optimizing the sub-network structure for accurate cell type annotation. Multiple comparison experiments on real scRNA-seq datasets from various species have demonstrated that PredGCN surpasses existing state-of-the-art methods, including its scalability to cross-species datasets. Moreover, PredGCN can uncover unknown cell types and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into cell type identification and characterizing scRNA-seq data from different perspectives. Availability and implementation The source code is available at https://github.com/IrisQi7/PredGCN and test data is available at https://figshare.com/articles/dataset/PredGCN/25251163. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13674803
Volume :
40
Issue :
7
Database :
Academic Search Index
Journal :
Bioinformatics
Publication Type :
Academic Journal
Accession number :
178887784
Full Text :
https://doi.org/10.1093/bioinformatics/btae421