Back to Search
Start Over
HieRFIT: a hierarchical cell type classification tool for projections from complex single-cell atlas datasets
- Source :
- Bioinformatics. 37:4431-4436
- Publication Year :
- 2021
- Publisher :
- Oxford University Press (OUP), 2021.
-
Abstract
- Motivation The emergence of single-cell RNA sequencing (scRNA-seq) has led to an explosion in novel methods to study biological variation among individual cells, and to classify cells into functional and biologically meaningful categories. Results Here, we present a new cell type projection tool, Hierarchical Random Forest for Information Transfer (HieRFIT), based on hierarchical random forests. HieRFIT uses a priori information about cell type relationships to improve classification accuracy, taking as input a hierarchical tree structure representing the class relationships, along with the reference data. We use an ensemble approach combining multiple random forest models, organized in a hierarchical decision tree structure. We show that our hierarchical classification approach improves accuracy and reduces incorrect predictions especially for inter-dataset tasks which reflect real-life applications. We use a scoring scheme that adjusts probability distributions for candidate class labels and resolves uncertainties while avoiding the assignment of cells to incorrect types by labeling cells at internal nodes of the hierarchy when necessary. Availability and implementation HieRFIT is implemented as an R package, and it is available at (https://github.com/yasinkaymaz/HieRFIT/releases/tag/v1.0.0). Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
0303 health sciences
Information transfer
Hierarchy (mathematics)
Computer science
Decision tree
computer.software_genre
Biochemistry
Class (biology)
Computer Science Applications
Random forest
03 medical and health sciences
Computational Mathematics
Projection (relational algebra)
0302 clinical medicine
Tree structure
Computational Theory and Mathematics
Probability distribution
Data mining
Molecular Biology
computer
030217 neurology & neurosurgery
030304 developmental biology
Subjects
Details
- ISSN :
- 14602059 and 13674803
- Volume :
- 37
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....a17d542b5ca3d091b1bcdd462ea196f2