1. Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations
- Author
-
Xu, Zhi Ming, Rüeger, Sina, Zwyer, Michaela, Brites, Daniela, Hiza, Hellen, Reinhard, Miriam, Rutaihwa, Liliana, Borrell, Sonia, Isihaka, Faima, Temba, Hosiana, Maroa, Thomas, Naftari, Rastard, Hella, Jerry, Sasamalo, Mohamed, Reither, Klaus, Portevin, Damien, Gagneux, Sebastien, and Fellay, Jacques
- Subjects
Male ,Genotyping ,Heredity ,Genotype ,QH301-705.5 ,Single Nucleotide Polymorphisms ,Research and Analysis Methods ,Polymorphism, Single Nucleotide ,Tanzania ,Chromosomes ,Cellular and Molecular Neuroscience ,Genome-Wide Association Studies ,Genetics ,Humans ,Biology (General) ,Molecular Biology Techniques ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Evolutionary Biology ,Sex Chromosomes ,Computational Biology/methods ,Genetics, Population/methods ,Genetics, Population/standards ,Genome-Wide Association Study/methods ,Genome-Wide Association Study/standards ,Polymorphism, Single Nucleotide/genetics ,Population Biology ,Ecology ,Chromosome Biology ,Computational Biology ,Biology and Life Sciences ,Human Genetics ,Genomics ,Cell Biology ,Genome Analysis ,Y Chromosomes ,Genetic Mapping ,Genetics, Population ,Haplotypes ,Computational Theory and Mathematics ,Modeling and Simulation ,Haplogroups ,Population Genetics ,Genome-Wide Association Study ,Research Article - Abstract
Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array., Author summary Genome-wide association studies, which study the association between genetic variants and various phenotypes, typically rely on genotyping arrays. Only a small proportion of genetic variants within the genome are typed on genotyping arrays. Untyped variants are statistically inferred through a process known as genotype imputation, where correlations between variants (haplotypes) observed in external reference panels are leveraged to infer untyped variants in the study population. However, for study populations that are underrepresented in existing reference panels, the quality of imputation is often sub-optimal. This is because typed variants incorporated on existing genotyping arrays can be unsuitable for the study population, and haplotype structures can be different between the reference and the study population. Here, we illustrate an approach to select a custom set of population-specific typed variants to improve genotype imputation in such underrepresented populations.
- Published
- 2022