1. Screen technical noise in single cell RNA sequencing data.
- Author
-
Bai, Yu-Long, Baddoo, Melody, Flemington, Erik K., Nakhoul, Hani N., and Liu, Yao-Zhong
- Subjects
- *
RNA sequencing , *GENE libraries , *DATA scrubbing , *GENE expression , *NOISE - Abstract
We proposed a data cleaning pipeline for single cell (SC) RNA-seq data, where we first screen genes (gene-wise screening) followed by screening cell libraries (library-wise screening). Gene-wise screening is based on the expectation that for a gene with a low technical noise, a gene's count in a library will tend to increase with the increase of library size, which was tested using negative binomial regression of gene count (as dependent variable) against library size (as independent variable). Library-wise screening is based on the expectation that across-library correlations for housekeeping (HK) genes is expected to be higher than the correlations for non-housekeeping (NHK) genes in those libraries with low technical noise. We removed those libraries, whose mean pairwise correlation for HK genes is NOT significantly higher than that for NHK genes. We successfully applied the pipeline to two large SC RNA-seq datasets. The pipeline was also developed into an R package. • Single cell RNA sequencing measures gene expression at level of individual cells. • Single cell RNA sequencing data normally contain a lot of technical noise. • A pipeline is proposed to clean the technical noise based on housekeeping genes. • The pipeline is based on rigorous statistical analyses that avoid arbitrary cut off. • A software program is available to implement the pipeline. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF