1. A Three-groups Non-local Model for Combining Heterogeneous Data Sources to Identify Genes Associated with Parkinson's Disease
- Author
-
Wixson, Troy P., Shaby, Benjamin A., Philtron, Daisy L., Consortium, International Parkinson Disease Genomics, Lima, Leandro A., Wyman, Stacia K., Kaye, Julia A., and Finkbeiner, Steven
- Subjects
Statistics - Applications - Abstract
We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the family of statistical models that we propose is a hierarchical three-group mixture of distributions. Each gene is modeled probabilistically as belonging to either a null group that is unassociated with PD, a deleterious group, or a beneficial group. This three-group formalism has two key features. By apportioning prior probability of group assignments with a Dirichlet distribution, the resultant posterior group probabilities automatically account for the multiplicity inherent in analyzing many genes simultaneously. By building models for experimental outcomes conditionally on the group labels, any number of data modalities may be combined in a single coherent probability model, allowing information sharing across experiment types. These two features result in parsimonious inference with few false positives, while simultaneously enhancing power to detect signals. Simulations show that our three-groups approach performs at least as well as commonly-used tools for GWAS and RNA-seq, and in some cases it performs better. We apply our proposed approach to publicly-available GWAS and RNA-seq datasets, discovering novel genes that are potential therapeutic targets., Comment: 26 pages, 6 figures, 4 tables
- Published
- 2024