1. Modelling and inferring neutral and non-neutral forces shaping genomic site frequencies
- Author
-
Mikula, Lynette Caitlin and Kosiol, Carolin
- Subjects
Probabilistic models for genetic evolution ,Diffusion model for population demography (orthogonal polynomial diffusion) ,Moran model with low-scaled mutation rates and selection (directional and balancing) ,Population genetic landscape of orangutans (GC content and recombination) ,Genome segmentation algorithm (ordered Hidden Markov Model with emission densities) - Abstract
Single nucleotide polymorphisms in samples of DNA sequences from one or multiple populations can be summarised as site frequency spectra. Since polymorphic sites are known to be predominantly biallelic, models for the evolution of allele frequencies that assume low scaled mutation rates are justified. The biallelic boundary-mutation Moran model with reversible mutations (BMM) arises as an approximation to the classic Moran model under this consideration, and it underpins this PhD thesis. In the introduction, the BMM is presented as a mathematically tractable model that is e cient in its use of site frequency data for inferring mutation and selection parameters. Chapter 2 of this thesis extends the BMM to include balancing selection, in addition to biased mutations and a directional component (e.g., directional selection or biased gene conversion). In Chapter 3, discrete and stochastic demographic changes are incorporated into the spectral representation of the neutral BMM. A Hidden Markov Model inspired approach is used to simulate sample spectra under di↵erent scenarios, and propose a new inference method. A novel class of Hidden Markov Models with ordered hidden states and emission densities (oHMMed) is introduced in Chapter 4 alongside the source code of a corresponding R-package. In Chapter 5, oHMMed is used to annotate the genome of orangutans according to average levels of GC content and recombination rates. Site frequency spectra of similar regions are subjected to Markov Chain Monte Carlo analyses based on the BMM, and to demographic inference per Chapter 3. They are further characterised by structural genomic features. Overall, this provides a quantification of how biased gene conversion and recombination shape the background variation in hominid site frequency data. Utilised conjointly, the methods developed in this thesis could help inform an extended null model of evolution, and improve genome scans.
- Published
- 2023
- Full Text
- View/download PDF