Back to Search
Start Over
Ensembled best subset selection using summary statistics for polygenic risk prediction.
- Source :
-
BioRxiv : the preprint server for biology [bioRxiv] 2023 Sep 27. Date of Electronic Publication: 2023 Sep 27. - Publication Year :
- 2023
-
Abstract
- Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L <subscript>0</subscript> L <subscript>2</subscript> penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.
Details
- Language :
- English
- Database :
- MEDLINE
- Journal :
- BioRxiv : the preprint server for biology
- Accession number :
- 37886515
- Full Text :
- https://doi.org/10.1101/2023.09.25.559307