Back to Search
Start Over
Making the Most of Clumping and Thresholding for Polygenic Scores
- Source :
- American Journal of Human Genetics, American Journal of Human Genetics, Elsevier (Cell Press), 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Am J Hum Genet, American Journal of Human Genetics, 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Privé, F, Vilhjálmsson, B J, Aschard, H & Blum, M G B 2019, ' Making the Most of Clumping and Thresholding for Polygenic Scores ', American Journal of Human Genetics, vol. 105, no. 6, pp. 1213-1221 . https://doi.org/10.1016/j.ajhg.2019.11.001
- Publication Year :
- 2019
- Publisher :
- HAL CCSD, 2019.
-
Abstract
- Polygenic prediction has the potential to contribute to precision medicine. Clumping and Thresh-olding (C+T) is a widely used method to derive polygenic scores. When using C+T, it is common to test several p-value thresholds to maximize predictive ability of the derived polygenic scores. Along with this p-value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T polygenic scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123,200 different C+T scores for 300K individuals and 1M variants on a single node with 16 cores.We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p-value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p-value threshold in C+T to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T.We further propose Stacked Clumping and Thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to 8 different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.
- Subjects :
- 0301 basic medicine
MESH: United Kingdom
Multifactorial Inheritance
C+T
PRS
0302 clinical medicine
Statistics
MESH: Disease
Disease
MESH: Models, Genetic
Linear combination
Genetics (clinical)
Mathematics
Biological Specimen Banks
0303 health sciences
Training set
MESH: Biological Specimen Banks
MESH: Polymorphism, Single Nucleotide
MESH: Genetic Predisposition to Disease
Thresholding
MESH: Case-Control Studies
3. Good health
polygenic risk scores
stacking
[STAT.ME]Statistics [stat]/Methodology [stat.ME]
Algorithms
UK Biobank
MESH: Algorithms
Polymorphism, Single Nucleotide
Article
Set (abstract data type)
complex traits
03 medical and health sciences
Single node
MESH: Computer Simulation
Genetics
Humans
Computer Simulation
Genetic Predisposition to Disease
p-value
030304 developmental biology
Penalized regression
MESH: Humans
[SDV.GEN.GPO]Life Sciences [q-bio]/Genetics/Populations and Evolution [q-bio.PE]
Models, Genetic
clumping and thresholding
United Kingdom
030104 developmental biology
Case-Control Studies
MESH: Genome-Wide Association Study
[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie
MESH: Multifactorial Inheritance
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
030217 neurology & neurosurgery
Genome-Wide Association Study
Subjects
Details
- Language :
- English
- ISSN :
- 00029297 and 15376605
- Database :
- OpenAIRE
- Journal :
- American Journal of Human Genetics, American Journal of Human Genetics, Elsevier (Cell Press), 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Am J Hum Genet, American Journal of Human Genetics, 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Privé, F, Vilhjálmsson, B J, Aschard, H & Blum, M G B 2019, ' Making the Most of Clumping and Thresholding for Polygenic Scores ', American Journal of Human Genetics, vol. 105, no. 6, pp. 1213-1221 . https://doi.org/10.1016/j.ajhg.2019.11.001
- Accession number :
- edsair.doi.dedup.....477fe48dcfd1abdc7996cb2d6df19263