Back to Search Start Over

Making the Most of Clumping and Thresholding for Polygenic Scores

Authors :
Michael G. B. Blum
Florian Privé
Bjarni J. Vilhjálmsson
Hugues Aschard
Biologie Computationnelle et Mathématique (TIMC-IMAG-BCM)
Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques et Applications, Grenoble - UMR 5525 (TIMC-IMAG)
Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
Aarhus University [Aarhus]
Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI)
Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS)
F.P. and M.G.B.B. acknowledge LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) and ANR project FROGH (ANR-16-CE12-0033). F.P. and M.G.B.B. also acknowledge the Grenoble Alpes Data Institute that is supported by the French National Research Agency under the 'Investissements d’avenir' program (ANR-15-IDEX-02). F.P. and B.J.V. acknowledge Niels Bohr Professorship from the Danish National Research Foundation to Prof. John J. McGrath, and the Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH (R248-2017-2003). This research has been conducted using the UK Biobank Resource under Application Number 25589.
ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011)
ANR-16-CE12-0033,FROGH,Etude Génétique de la Population Française(2016)
ANR-15-IDEX-0002,UGA,IDEX UGA(2015)
Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
Source :
American Journal of Human Genetics, American Journal of Human Genetics, Elsevier (Cell Press), 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Am J Hum Genet, American Journal of Human Genetics, 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Privé, F, Vilhjálmsson, B J, Aschard, H & Blum, M G B 2019, ' Making the Most of Clumping and Thresholding for Polygenic Scores ', American Journal of Human Genetics, vol. 105, no. 6, pp. 1213-1221 . https://doi.org/10.1016/j.ajhg.2019.11.001
Publication Year :
2019
Publisher :
HAL CCSD, 2019.

Abstract

Polygenic prediction has the potential to contribute to precision medicine. Clumping and Thresh-olding (C+T) is a widely used method to derive polygenic scores. When using C+T, it is common to test several p-value thresholds to maximize predictive ability of the derived polygenic scores. Along with this p-value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T polygenic scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123,200 different C+T scores for 300K individuals and 1M variants on a single node with 16 cores.We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p-value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p-value threshold in C+T to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T.We further propose Stacked Clumping and Thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to 8 different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.

Details

Language :
English
ISSN :
00029297 and 15376605
Database :
OpenAIRE
Journal :
American Journal of Human Genetics, American Journal of Human Genetics, Elsevier (Cell Press), 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Am J Hum Genet, American Journal of Human Genetics, 2019, 105 (6), pp.1213-1221. ⟨10.1016/j.ajhg.2019.11.001⟩, Privé, F, Vilhjálmsson, B J, Aschard, H & Blum, M G B 2019, ' Making the Most of Clumping and Thresholding for Polygenic Scores ', American Journal of Human Genetics, vol. 105, no. 6, pp. 1213-1221 . https://doi.org/10.1016/j.ajhg.2019.11.001
Accession number :
edsair.doi.dedup.....477fe48dcfd1abdc7996cb2d6df19263