Back to Search
Start Over
Big data approaches to microbial genomics
- Publication Year :
- 2022
- Publisher :
- University of Oxford, 2022.
-
Abstract
- Alongside tremendous challenges in infectious diseases, like the rise of antimicrobial resistance and the coronavirus disease pandemic, the 21st century is also witness to the big data revolution, which offers opportunities to design methodology capable of addressing these great challenges. Whilst developing tools there are two competing philosophies of how to gain insight from big data: The modelling approach, where the natural data generating mechanism is approximated by statistical inference, and the algorithmic approach, where general-purpose algorithms are tuned to capture hidden structure in the data for prediction. The aim of the thesis is to contribute existing infectious disease problems, by motivating, designing, and applying the correct big data methodology, whilst facilitating future use through generating applications can be easily re-purposed. I first design a machine learner that can predict the source of Campylobacteriosis 33% more accurately than the previous most commonly used methods. Our method broadens the data input spectrum to captures of whole genomes, which uniquely allows assigning sources to individual samples showing a shift in host affinity of one of the most common lineages of Campylobacter jejuni. Based on the individual prediction of the machine learner,I infer which genetic changes are associated with host specificity by conducting a genome-wide association study. I find fluoroquinolone resistant genes pre-adapting chicken isolates to infection for humans and polyphosphate pathway associated genes to distinguish adaption to chicken and ruminant niche. For the study of COVID-19 risk, I conduct a machine learning prediction of very severe forms of the disease, hospitalisation, and susceptibility, whilst also inferring risk factors for all phenotypes by applying Bayesian model averaging. I re-discover commonly defined risk factors describing socio-economic standing, ill health and ethnicity whilst discovering more novel factors like previous lung injury predisposing very severe COVID-19 and bring order to the wealth of published COVID-19 risk studies. In the closing arguments I give limitations of my work and give recommendations on how the developed tools can be re-applied to make big data research more accessible. I also expand how statistical inference and machine learning prediction can be used in unison to tap into the potential of big data to address the foremost infectious disease challenges of our time.
- Subjects :
- Genomics
Bioinformatics
Subjects
Details
- Language :
- English
- Database :
- British Library EThOS
- Publication Type :
- Dissertation/ Thesis
- Accession number :
- edsble.864898
- Document Type :
- Electronic Thesis or Dissertation