Back to Search
Start Over
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa
- Source :
- PLoS Computational Biology, Vol 14, Iss 4, p e1006053 (2018), PLoS Computational Biology
- Publication Year :
- 2018
- Publisher :
- Public Library of Science (PLoS), 2018.
-
Abstract
- Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC.<br />Author summary scRNA-seq enables detailed profiling of heterogeneous cell populations and can be used to reveal lineage relationships or discover new cell types. In the literature, there has been little effort directed towards developing computational methods for cross-population transcriptome analysis of multiple single-cell populations. The cross-cell-population clustering problem is different from the traditional clustering problem because single-cell populations can be collected from different patients, different samples of a tissue, or different experimental replicates. The accompanying biological and technical variation tends to dominate the signals for clustering the pooled single cells from the multiple populations. In this work, we have developed a multitask clustering method to address the cross-population clustering problem. The method simultaneously clusters each individual cell population and controls variance among the cell-type cluster centers within each cell population and across the cell populations. We demonstrate that our multitask clustering method significantly improves clustering accuracy and marker discovery in three public scRNA-seq datasets and also apply the method to an in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) dataset. Our results make it evident that multitask clustering is a promising new approach for cross-population analysis of scRNA-seq data.
- Subjects :
- 0301 basic medicine
Lung Development
Organogenesis
Gene Expression
RNA-Seq
Stem cell marker
Machine Learning
Mice
Spectrum Analysis Techniques
Sequencing techniques
0302 clinical medicine
Single-cell analysis
Animal Cells
Medicine and Health Sciences
Cluster Analysis
Cell Cycle and Cell Division
Lung
lcsh:QH301-705.5
Connective Tissue Cells
Ecology
Applied Mathematics
Simulation and Modeling
High-Throughput Nucleotide Sequencing
RNA sequencing
Flow Cytometry
Epidermolysis Bullosa Dystrophica
Computational Theory and Mathematics
Spectrophotometry
Connective Tissue
Cell Processes
030220 oncology & carcinogenesis
Modeling and Simulation
Physical Sciences
Cytophotometry
Cellular Types
Anatomy
Single-Cell Analysis
Algorithms
Research Article
Genetic Markers
Collagen Type VII
Sequence analysis
Computational biology
Biology
Research and Analysis Methods
03 medical and health sciences
Cellular and Molecular Neuroscience
Genetics
medicine
Animals
Humans
Computer Simulation
natural sciences
Molecular Biology Techniques
Cluster analysis
Molecular Biology
Embryonic Stem Cells
Ecology, Evolution, Behavior and Systematics
Models, Genetic
Sequence Analysis, RNA
Gene Expression Profiling
Epidermolysis bullosa dystrophica
Biology and Life Sciences
Computational Biology
Marker Genes
Cell Biology
Fibroblasts
medicine.disease
Gene expression profiling
Biological Tissue
030104 developmental biology
lcsh:Biology (General)
Sample size determination
Case-Control Studies
Leukocytes, Mononuclear
RNA
Organism Development
Mathematics
Developmental Biology
Subjects
Details
- Language :
- English
- ISSN :
- 15537358
- Volume :
- 14
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- PLoS Computational Biology
- Accession number :
- edsair.doi.dedup.....be3d38eea62f36cf4e2b71331a156717