Back to Search Start Over

UKB.COVID19: an R package for UK Biobank COVID-19 data processing and analysis [version 2; peer review: 1 approved, 1 not approved]

Authors :
Longfei Wang
Victoria E Jackson
Liam G Fearnley
Melanie Bahlo
Author Affiliations :
<relatesTo>1</relatesTo>Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia<br /><relatesTo>2</relatesTo>Department of Medical Biology, The University of Melbourne, Parkville, VIC, 3010, Australia
Source :
F1000Research. 10:830
Publication Year :
2022
Publisher :
London, UK: F1000 Research Limited, 2022.

Abstract

COVID-19 caused by SARS-CoV-2 has resulted in a global pandemic with a rapidly developing global health and economic crisis. Variations in the disease have been observed and have been associated with the genomic sequence of either the human host or the pathogen. Worldwide scientists scrambled initially to recruit patient cohorts to try and identify risk factors. A resource that presented itself early on was the UK Biobank (UKBB), which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease. To enable COVID-19 studies, UKBB is now receiving COVID-19 test data for their participants every two weeks. In addition, UKBB is delivering more frequent updates of death and hospital inpatient data (including critical care admissions) on the UKBB Data Portal. This frequently changing dataset requires a tool that can rapidly process and analyse up-to-date data. We developed an R package specifically for the UKBB COVID-19 data, which summarises COVID-19 test results, performs association tests between COVID-19 susceptibility/severity and potential risk factors such as age, sex, blood type, comorbidities and generates input files for genome-wide association studies (GWAS). By applying the R package to data released in April 2021, we found that age, body mass index, socioeconomic status and smoking are positively associated with COVID-19 susceptibility, severity, and mortality. Males are at a higher risk of COVID-19 infection than females. People staying in aged care homes have a higher chance of being exposed to SARS-CoV-2. By performing GWAS, we replicated the 3p21.31 genetic finding for COVID-19 susceptibility and severity. The ability to iteratively perform such analyses is highly relevant since the UKBB data is updated frequently. As a caveat, users must arrange their own access to the UKBB data to use the R package.

Details

ISSN :
20461402
Volume :
10
Database :
F1000Research
Journal :
F1000Research
Notes :
Revised Amendments from Version 1 The newly revised article contains additional information as suggested by the reviewers, which includes 1) how comorbidities are retrieved, classified and analysed; 2) how we classify severity, why we include all COVID-19 patients for severity phenotypes and why we convert severity phenotypes into multiple binary variables instead of analysing it as an ordinal variable; 3) clarifying the definition of mortality that is "due to" COVID-19 not "with" COVID-19., , [version 2; peer review: 1 approved, 1 not approved]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.55370.2
Document Type :
software-tool
Full Text :
https://doi.org/10.12688/f1000research.55370.2