1. Investigating infectious diseases using genetic data
- Author
-
Lin, ShangKuan, Wilson, Daniel, Mentzer, Alexander, and Ansari, Mohamad
- Subjects
Infectious diseases ,Hepatitis B virus ,Hepatitis C virus ,Computational biology ,Bioinformatics - Abstract
Infectious diseases have been a major cause of mortality and morbidity throughout human history. The dramatic advances in technology, especially in genome sequencing, have opened a new avenue to study infectious diseases and changed the landscape of public health and medical science. Compared to non-communicable human diseases where human genetics is the sole focus of genetic studies, infectious diseases add another layer of complexity due to the host-pathogen dynamic. Since infection can be seen as the result of the complex interplay of host and pathogen and the environment, a comprehensive understanding requires the incorporation of all these factors. In this thesis, I explore the different applications of genetic data in studying infectious diseases through the lens of host-pathogen dynamics and their clinical or public health implication. I first used virus whole genome data to rebuild the epidemiological history of Hepatitis C virus subtype 3a (HCV-3a). I showed how World War II had likely led to the spread of HCV-3a from South Asia and becoming a global epidemic. Notably, I also incorporated host genetic data into the analysis and showed how it helped us identify and address the confounding factor of human migration when conducting an evolutionary analysis on HCV-3a. I then integrated previously published methods to investigate protein residue covariation to investigate the signals co-evolution within the Hepatitis B virus (HBV). I built a co-evolution profile for HBV and link them to previous studies to demonstrate how drug resistance is one of the major driving forces of protein residue co-evolution in HBV. Lastly, I explored the potential of integration of national-scale databases that collect pathogen (Second Generation Surveillance System) and host data (UK Biobank) respectively and conducted multiple genome-wide association studies (GWASs) and lay out how such integration can help improve our understanding of human genetic risk factors for infectious diseases. To conclude, I discuss the importance of host-pathogen data integration as well as the need for developing statistical models that can account for both the data volume as well as complexity resulted from such integration.
- Published
- 2023