1. Distribution-free complex hypothesis testing for single-cell RNA-seq differential expression analysis
- Author
-
Boris P. Hejblum, Rodolphe Thiébaut, Denis Agniel, Marine Gauthier, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Vaccine Research Institute (VRI), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), and Hejblum, Boris
- Subjects
0303 health sciences ,Computer science ,Design of experiments ,Cumulative distribution function ,Asymptotic distribution ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Resampling ,Covariate ,Test statistic ,Benchmark (computing) ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,0101 mathematics ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,Algorithm ,[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM] ,030304 developmental biology ,Statistical hypothesis testing - Abstract
SummaryState-of-the-art methods for single-cell RNA sequencing (scRNA-seq) Differential Expression Analysis (DEA) often rely on strong distributional assumptions that are difficult to verify in practice. Furthermore, while the increasing complexity of clinical and biological single-cell studies calls for greater tool versatility, the majority of existing methods only tackle the comparison between two conditions. We propose a novel, distribution-free, and flexible approach to DEA for single-cell RNA-seq data. This new method, called ccdf, tests the association of each gene expression with one or many variables of interest (that can be either continuous or discrete), while potentially adjusting for additional covariates. To test such complex hypotheses, ccdf uses a conditional independence test relying on the conditional cumulative distribution function, estimated through multiple regressions. We provide the asymptotic distribution of the ccdf test statistic as well as a permutation test (when the number of observed cells is not sufficiently large). ccdf substantially expands the possibilities for scRNA-seq DEA studies: it obtains good statistical performance in various simulation scenarios considering complex experimental designs (i.e. beyond the two condition comparison), while retaining competitive performance with state-of-the-art methods in a two-condition benchmark. We apply ccdf to a large publicly available scRNA-seq dataset of 84,140 SARS-CoV-2 reactive CD8+ T cells, in order to identify the diffentially expressed genes across 3 groups of COVID-19 severity (mild, hospitalized, and ICU) while accounting for seven different cellular subpopulations.
- Published
- 2021