Back to Search Start Over

Accelerating Genome- and Phenome-Wide Association Studies using GPUs - A case study using data from the Million Veteran Program.

Authors :
Rodriguez A
Kim Y
Nandi TN
Keat K
Kumar R
Bhukar R
Conery M
Liu M
Hessington J
Maheshwari K
Schmidt D
Begoli E
Tourassi G
Muralidhar S
Natarajan P
Voight BF
Cho K
Gaziano JM
Damrauer SM
Liao KP
Zhou W
Huffman JE
Verma A
Madduri RK
Source :
BioRxiv : the preprint server for biology [bioRxiv] 2024 May 22. Date of Electronic Publication: 2024 May 22.
Publication Year :
2024

Abstract

The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.

Details

Language :
English
ISSN :
2692-8205
Database :
MEDLINE
Journal :
BioRxiv : the preprint server for biology
Publication Type :
Academic Journal
Accession number :
38826407
Full Text :
https://doi.org/10.1101/2024.05.17.594583