Back to Search Start Over

Ensembled best subset selection using summary statistics for polygenic risk prediction.

Authors :
Chen T
Zhang H
Mazumder R
Lin X
Source :
BioRxiv : the preprint server for biology [bioRxiv] 2023 Sep 27. Date of Electronic Publication: 2023 Sep 27.
Publication Year :
2023

Abstract

Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L <subscript>0</subscript> L <subscript>2</subscript> penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.

Details

Language :
English
Database :
MEDLINE
Journal :
BioRxiv : the preprint server for biology
Accession number :
37886515
Full Text :
https://doi.org/10.1101/2023.09.25.559307