Back to Search Start Over

scClassify: sample size estimation and multiscale classification of cells using single and multiple reference.

Authors :
Lin, Yingxin
Cao, Yue
Kim, Hani Jieun
Salim, Agus
Speed, Terence P
Lin, David M
Yang, Pengyi
Yang, Jean Yee Hwa
Source :
Molecular Systems Biology; Jun2020, Vol. 16 Issue 6, p1-16, 16p
Publication Year :
2020

Abstract

Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data. Synopsis: scClassify is a multiscale classification framework based on ensemble learning and cell type hierarchies, enabling sample size estimation required for accurate cell type classification and joint classification of cells using multiple references. scClassify performs multiscale cell type classification based on cell type hierarchies constructed from single or multiple reference datasets.It implements a post‐hoc clustering procedure for discovering novel cell types from cells that are unassigned due to the absence of their types in the reference data.It enables the estimation of the number of cells required in a reference dataset to accurately discriminate a given cell type in a cell type hierarchy.Application to large atlas datasets such as Tabula Muris demonstrates its ability to refine cell types and identify cells from sub‐populations. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
17444292
Volume :
16
Issue :
6
Database :
Complementary Index
Journal :
Molecular Systems Biology
Publication Type :
Academic Journal
Accession number :
144299551
Full Text :
https://doi.org/10.15252/msb.20199389