Back to Search
Start Over
JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing
- Source :
- BMC Genomics, BMC Genomics, Vol 22, Iss 1, Pp 1-16 (2021)
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- Background Single-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. However, the high frequency of gene expression detection failures in scRNA-Seq data make it challenging to achieve reliable identification of cell-types and Differentially Expressed Genes (DEG). Moreover, with the explosive growth of single-cell data using 10x genomics protocol, existing methods will soon reach the computation limit due to scalability issues. The single-cell transcriptomics field desperately need new tools and framework to facilitate large-scale single-cell analysis. Results In order to improve the accuracy, robustness, and speed of scRNA-Seq data processing, we propose a generalized zero-inflated negative binomial mixture model, “JOINT,” that can perform probability-based cell-type discovery and DEG analysis simultaneously without the need for imputation. JOINT performs soft-clustering for cell-type identification by computing the probability of individual cells, i.e. each cell can belong to multiple cell types with different probabilities. This is drastically different from existing hard-clustering methods where each cell can only belong to one cell type. The soft-clustering component of the algorithm significantly facilitates the accuracy and robustness of single-cell analysis, especially when the scRNA-Seq datasets are noisy and contain a large number of dropout events. Moreover, JOINT is able to determine the optimal number of cell-types automatically rather than specifying it empirically. The proposed model is an unsupervised learning problem which is solved by using the Expectation and Maximization (EM) algorithm. The EM algorithm is implemented using the TensorFlow deep learning framework, dramatically accelerating the speed for data analysis through parallel GPU computing. Conclusions Taken together, the JOINT algorithm is accurate and efficient for large-scale scRNA-Seq data analysis via parallel computing. The Python package that we have developed can be readily applied to aid future advances in parallel computing-based single-cell algorithms and research in various biological and biomedical fields.
- Subjects :
- Parallel computing
Fuzzy clustering
lcsh:QH426-470
lcsh:Biotechnology
Biology
03 medical and health sciences
0302 clinical medicine
Robustness (computer science)
Soft-clustering
lcsh:TP248.13-248.65
Expectation–maximization algorithm
Genetics
Cluster Analysis
DEG
RNA-Seq
Probability
030304 developmental biology
Single-cell
0303 health sciences
JOINT
Sequence Analysis, RNA
Methodology Article
Dropout
Gene Expression Profiling
Correction
Deep learning
Mixture model
lcsh:Genetics
Identification (information)
Scalability
RNA
Unsupervised learning
Single-Cell Analysis
General-purpose computing on graphics processing units
030217 neurology & neurosurgery
Biotechnology
Subjects
Details
- ISSN :
- 14712164
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- BMC Genomics
- Accession number :
- edsair.doi.dedup.....181fb121a773959ed50b3dc17e814cfa
- Full Text :
- https://doi.org/10.1186/s12864-020-07302-6