1. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data
- Author
-
Peter R. Wilton, Rasmus Nielsen, Aaron J. Stern, and Hernandez, Ryan D
- Subjects
Cancer Research ,Heredity ,Computer science ,Population Dynamics ,Markov models ,QH426-470 ,Geographical Locations ,0302 clinical medicine ,Gene Frequency ,Models ,Natural Selection ,Hidden Markov models ,Genetics (clinical) ,Data Management ,education.field_of_study ,Likelihood Functions ,0303 health sciences ,Natural selection ,Edar Receptor ,Pigmentation ,Simulation and Modeling ,Selection coefficient ,Phylogenetic Analysis ,Markov Chains ,Asians ,Phylogenetics ,Physical sciences ,Europe ,Genetic Mapping ,symbols ,Likelihood function ,Sequence Analysis ,Monte Carlo Method ,Algorithm ,Research Article ,Asian Continental Ancestry Group ,Computer and Information Sciences ,Evolutionary Processes ,Population ,European Continental Ancestry Group ,Biology ,Research and Analysis Methods ,DNA sequencing ,White People ,Molecular Genetics ,03 medical and health sciences ,symbols.namesake ,Genetic ,Asian People ,Genetics ,Humans ,Evolutionary Systematics ,Allele ,education ,Gene ,Molecular Biology ,Allele frequency ,Alleles ,Ecology, Evolution, Behavior and Systematics ,Selection (genetic algorithm) ,Taxonomy ,030304 developmental biology ,Evolutionary Biology ,Base Sequence ,Models, Genetic ,Population Biology ,Whites ,Biology and Life Sciences ,Probability theory ,Markov chain Monte Carlo ,DNA ,Sequence Analysis, DNA ,Minichromosome Maintenance Complex Component 6 ,Geographic Distribution ,Haplotypes ,People and Places ,Generic health relevance ,Selective sweep ,Mathematics ,030217 neurology & neurosurgery ,Importance sampling ,Developmental Biology - Abstract
Most current methods for detecting natural selection from DNA sequence data are limited in that they are either based on summary statistics or a composite likelihood, and as a consequence, do not make full use of the information available in DNA sequence data. We here present a new importance sampling approach for approximating the full likelihood function for the selection coefficient. Our method CLUES treats the ancestral recombination graph (ARG) as a latent variable that is integrated out using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used for detecting selection, estimating selection coefficients, testing models of changes in the strength of selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency trajectory of a selected or neutral allele. We perform extensive simulations to evaluate the method and show that it uniformly improves power to detect selection compared to current popular methods such as nSL and SDS, and can provide reliable inferences of allele frequency trajectories under many conditions. We also explore the potential of our method to detect extremely recent changes in the strength of selection. We use the method to infer the past allele frequency trajectory for a lactase persistence SNP (MCM6) in Europeans. We also infer the trajectory of a SNP (EDAR) in Han Chinese, finding evidence that this allele’s age is much older than previously claimed. We also study a set of 11 pigmentation-associated variants. Several genes show evidence of strong selection particularly within the last 5,000 years, including ASIP, KITLG, and TYR. However, selection on OCA2/HERC2 seems to be much older and, in contrast to previous claims, we find no evidence of selection on TYRP1., Author summary Current methods to study natural selection using modern population genomic data are limited in their power and flexibility. Here, we present a new method to infer natural selection that builds on recent methodological advances in estimating genome-wide genealogies. By using importance sampling we are able to efficiently estimate the likelihood function of the selection coefficient. We show our method improves power to test for selection over competing methods across a diverse range of scenarios, and also accurately infers the selection coefficient. We also demonstrate a novel capability of our model, using it to infer the allele’s frequency over time. We validate these results with a study of a lactase persistence SNP in Europeans, and also study a SNP at EDAR, as well as a set of 11 pigmentation-associated variants.
- Published
- 2019
- Full Text
- View/download PDF