Back to Search
Start Over
Large scale maximum average power multiple inference on time‐course count data with application to RNA‐seq analysis
- Source :
- Biometrics. 76:9-22
- Publication Year :
- 2019
- Publisher :
- Wiley, 2019.
-
Abstract
- Experiments that longitudinally collect RNA sequencing (RNA-seq) data can provide transformative insights in biology research by revealing the dynamic patterns of genes. Such experiments create a great demand for new analytic approaches to identify differentially expressed (DE) genes based on large-scale time-course count data. Existing methods, however, are suboptimal with respect to power and may lack theoretical justification. Furthermore, most existing tests are designed to distinguish among conditions based on overall differential patterns across time, though in practice, a variety of composite hypotheses are of more scientific interest. Finally, some current methods may fail to control the false discovery rate. In this paper, we propose a new model and testing procedure to address the above issues simultaneously. Specifically, conditional on a latent Gaussian mixture with evolving means, we model the data by negative binomial distributions. Motivated by Storey (2007) and Hwang and Liu (2010), we introduce a general testing framework based on the proposed model and show that the proposed test enjoys the optimality property of maximum average power. The test allows not only identification of traditional DE genes but also testing of a variety of composite hypotheses of biological interest. We establish the identifiability of the proposed model, implement the proposed method via efficient algorithms, and demonstrate its good performance via simulation studies. The procedure reveals interesting biological insights, when applied to data from an experiment that examines the effect of varying light environments on the fundamental physiology of the marine diatom Phaeodactylum tricornutum.
- Subjects :
- Statistics and Probability
False discovery rate
Biometry
Scale (ratio)
Gaussian
Normal Distribution
Negative binomial distribution
Inference
computer.software_genre
01 natural sciences
General Biochemistry, Genetics and Molecular Biology
010104 statistics & probability
03 medical and health sciences
symbols.namesake
Humans
Computer Simulation
RNA-Seq
0101 mathematics
030304 developmental biology
0303 health sciences
General Immunology and Microbiology
Gene Expression Profiling
Applied Mathematics
General Medicine
Binomial Distribution
Identification (information)
symbols
Identifiability
Data mining
General Agricultural and Biological Sciences
computer
Algorithms
Count data
Subjects
Details
- ISSN :
- 15410420 and 0006341X
- Volume :
- 76
- Database :
- OpenAIRE
- Journal :
- Biometrics
- Accession number :
- edsair.doi.dedup.....a79531695e70d8663ce5d51b1ec9b5b5
- Full Text :
- https://doi.org/10.1111/biom.13144