Back to Search
Start Over
A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
- Source :
- Proceedings of the National Academy of Sciences of the United States of America
- Publication Year :
- 2020
- Publisher :
- Proceedings of the National Academy of Sciences, 2020.
-
Abstract
- Significance Variation is rampant throughout human genomes: some of it affects disease risk, and most does not; to separate the two requires a plethora of hypothesis tests. This challenge of multiple testing—limiting false positives while maximizing power—arises in many “omics” studies and sciences. One approach is to control the false discovery rate (FDR), and a recent selective inference method for controlling FDR, adaptive P-value thresholding (AdaPT), facilitates incorporation of auxiliary information (covariates) related to each hypothesis test. How AdaPT performs on data is an open question. We apply AdaPT to results from genomic association studies and include many covariates. This adaptive search discovers a more complex and interpretable model with far greater power than classic multiple testing procedures.<br />To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive P-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association P values play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.
- Subjects :
- False discovery rate
Multifactorial Inheritance
Bipolar Disorder
Genotype
Quantitative Trait Loci
Inference
Genome-wide association study
Computational biology
Quantitative trait locus
Biology
eQTL
Polymorphism, Single Nucleotide
GWAS
Humans
Genetic Predisposition to Disease
Statistical hypothesis testing
Genetic association
Multidisciplinary
Biological Sciences
multiple hypothesis testing
Biophysics and Computational Biology
neuropsychiatric disorders
Expression quantitative trait loci
Multiple comparisons problem
Schizophrenia
false discovery rate
Algorithms
Genome-Wide Association Study
Subjects
Details
- ISSN :
- 10916490 and 00278424
- Volume :
- 117
- Database :
- OpenAIRE
- Journal :
- Proceedings of the National Academy of Sciences
- Accession number :
- edsair.doi.dedup.....a0785ac6e4a19f5c7ba35b4597ef7481