53 results on '"Bruce S. Weir"'
Search Results
2. Stationary distribution of the linkage disequilibrium coefficientr2
- Author
-
Jesse Goodman, Jing Liu, Bruce S. Weir, Rachel M. Fewster, and Wei Zhang
- Subjects
0106 biological sciences ,0301 basic medicine ,Population ,Probability density function ,010603 evolutionary biology ,01 natural sciences ,Article ,Linkage Disequilibrium ,03 medical and health sciences ,Quantitative Biology::Populations and Evolution ,Statistical physics ,education ,Alleles ,Ecology, Evolution, Behavior and Systematics ,Mathematics ,education.field_of_study ,Models, Statistical ,Stationary distribution ,Principle of maximum entropy ,Sampling (statistics) ,Quantitative Biology::Genomics ,Genetics, Population ,030104 developmental biology ,Sampling distribution ,Genetic Loci ,Mutation (genetic algorithm) ,Probability distribution ,Algorithms - Abstract
The linkage disequilibrium coefficient r(2) is a measure of statistical dependence of the alleles possessed by an individual at different genetic loci. It is widely used in association studies to search for the locations of disease-causing genes on chromosomes. Most studies to date treat r(2) as a fixed property of two loci in a finite population, and investigate the sampling distribution of estimators due to the statistical sampling of individuals from the population. Here, we instead consider the distribution of r(2) itself under a process of genetic sampling through the generations. Using a classical two-locus model for genetic drift, mutation, and recombination, we investigate the probability density function of r(2) at stationarity. This density function provides a tool for inference on evolutionary parameters such as mutation and recombination rates. We reconstruct the approximate stationary density of r(2) by calculating a finite sequence of the distribution’s moments and applying the maximum entropy principle. Our approach is based on the diffusion approximation, under which we demonstrate that for certain models in population genetics, moments of the stationary distribution can be obtained without knowing the probability distribution itself. To illustrate our approach, we show how the stationary probability density of r(2) can be used in a maximum likelihood framework to estimate mutation and recombination rates from sample data of r(2).
- Published
- 2019
- Full Text
- View/download PDF
3. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data
- Author
-
Pierrick, Wainschtein, Deepti, Jain, Zhili, Zheng, L Adrienne, Cupples, Aladdin H, Shadyab, Barbara, McKnight, Benjamin M, Shoemaker, Braxton D, Mitchell, Bruce M, Psaty, Charles, Kooperberg, Ching-Ti, Liu, Christine M, Albert, Dan, Roden, Daniel I, Chasman, Dawood, Darbar, Donald M, Lloyd-Jones, Donna K, Arnett, Elizabeth A, Regan, Eric, Boerwinkle, Jerome I, Rotter, Jeffrey R, O'Connell, Lisa R, Yanek, Mariza, de Andrade, Matthew A, Allison, Merry-Lynn N, McDonald, Mina K, Chung, Myriam, Fornage, Nathalie, Chami, Nicholas L, Smith, Patrick T, Ellinor, Ramachandran S, Vasan, Rasika A, Mathias, Ruth J F, Loos, Stephen S, Rich, Steven A, Lubitz, Susan R, Heckbert, Susan, Redline, Xiuqing, Guo, Y -D Ida, Chen, Cecelia A, Laurie, Ryan D, Hernandez, Stephen T, McGarvey, Michael E, Goddard, Cathy C, Laurie, Kari E, North, Leslie A, Lange, Bruce S, Weir, Loic, Yengo, Jian, Yang, and Michael, Zody
- Subjects
Multifactorial Inheritance ,Humans ,Polymorphism, Single Nucleotide ,Alleles ,Linkage Disequilibrium ,Genome-Wide Association Study - Abstract
Analyses of data from genome-wide association studies on unrelated individuals have shown that, for human traits and diseases, approximately one-third to two-thirds of heritability is captured by common SNPs. However, it is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular whether the causal variants are rare, or whether it is overestimated due to bias in inference from pedigree data. Here we estimated heritability for height and body mass index (BMI) from whole-genome sequence data on 25,465 unrelated individuals of European ancestry. The estimated heritability was 0.68 (standard error 0.10) for height and 0.30 (standard error 0.10) for body mass index. Low minor allele frequency variants in low linkage disequilibrium (LD) with neighboring variants were enriched for heritability, to a greater extent for protein-altering variants, consistent with negative selection. Our results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.
- Published
- 2021
4. Detection and quantification of inbreeding depression for complex traits from SNP data
- Author
-
Naomi R. Wray, Bruce S. Weir, Zhihong Zhu, Peter M. Visscher, Jian Yang, Matthew R. Robinson, and Loic Yengo
- Subjects
0106 biological sciences ,0301 basic medicine ,Genetics ,Linkage disequilibrium ,Multidisciplinary ,Single-nucleotide polymorphism ,Quantitative genetics ,Biological Sciences ,Biology ,010603 evolutionary biology ,01 natural sciences ,Genetic architecture ,Minor allele frequency ,03 medical and health sciences ,030104 developmental biology ,Inbreeding depression ,SNP ,Inbreeding - Abstract
Quantifying the effects of inbreeding is critical to characterizing the genetic architecture of complex traits. This study highlights through theory and simulations the strengths and shortcomings of three SNP-based inbreeding measures commonly used to estimate inbreeding depression (ID). We demonstrate that heterogeneity in linkage disequilibrium (LD) between causal variants and SNPs biases ID estimates, and we develop an approach to correct this bias using LD and minor allele frequency stratified inference (LDMS). We quantified ID in 25 traits measured in [Formula: see text] participants of the UK Biobank, using LDMS, and confirmed previously published ID for 4 traits. We find unique evidence of ID for handgrip strength, waist/hip ratio, and visual and auditory acuity (ID between -2.3 and -5.2 phenotypic SDs for complete inbreeding; [Formula: see text]). Our results illustrate that a careful choice of the measure of inbreeding combined with LDMS stratification improves both detection and quantification of ID using SNP data.
- Published
- 2017
- Full Text
- View/download PDF
5. Recovery of trait heritability from whole genome sequence data
- Author
-
Stephen S. Rich, Xiuqing Guo, Bruce M. Psaty, Mina K. Chung, Aladdin H. Shadyab, Nicholas L. Smith, Dan M. Roden, Christine M. Albert, Braxton D. Mitchell, Y.-D. Ida Chen, Matthew A. Allison, Loic Yengo, Barbara McKnight, Jerome I. Rotter, Leslie A. Lange, Mariza de Andrade, Patrick T. Ellinor, Charles Kooperberg, Cathy C. Laurie, Merry-Lynn McDonald, Ramachandran S. Vasan, Ryan D. Hernandez, Kari E. North, Peter M. Visscher, Pierrick Wainschtein, Rasika A. Mathias, Zhili Zheng, Bruce S. Weir, L. Adrienne Cupples, Lisa R. Yanek, Donna K. Arnett, Stephen T. McGarvey, Elizabeth A. Regan, Jian Yang, Ching-Ti Liu, Dawood Darbar, Eric Boerwinkle, Deepti Jain, Susan R. Heckbert, Susan Redline, and Benjamin Shoemaker
- Subjects
Whole genome sequencing ,2. Zero hunger ,Genetics ,0303 health sciences ,Linkage disequilibrium ,business.industry ,Genome-wide association study ,Single-nucleotide polymorphism ,Heritability ,Biology ,Genetic architecture ,Minor allele frequency ,03 medical and health sciences ,Text mining ,0302 clinical medicine ,Missing heritability problem ,Trait ,business ,030217 neurology & neurosurgery ,030304 developmental biology ,Genetic association - Abstract
Heritability, the proportion of phenotypic variance explained by genetic factors, can be estimated from pedigree data 1, but such estimates are uninformative with respect to the underlying genetic architecture. Analyses of data from genome-wide association studies (GWAS) on unrelated individuals have shown that for human traits and disease, approximately one-third to two-thirds of heritability is captured by common SNPs 2–5. It is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular if the causal variants are rare, or other reasons such as overestimation of heritability from pedigree data. Here we show that pedigree heritability for height and body mass index (BMI) appears to be largely recovered from whole-genome sequence (WGS) data on 25,465 unrelated individuals of European ancestry. We assigned 33.7 million genetic variants to groups based upon their minor allele frequencies (MAF) and linkage disequilibrium (LD) with variants nearby, and estimated and partitioned genetic variance accordingly. The estimated heritability was 0.68 (SE 0.10) for height and 0.30 (SE 0.10) for BMI, with a range of ~0.60 – 0.71 for height and ~0.25 – 0.35 for BMI, depending on quality control and analysis strategies. Low-MAF variants in low LD with neighbouring variants were enriched for heritability, to a greater extent for protein-altering variants, consistent with negative selection thereon. Cumulatively variants with 0.0001 < MAF < 0.1 explained 0.47 (SE 0.07) and 0.30 (SE 0.10) of heritability for height and BMI, respectively. Our results imply that rare variants, in particular those in regions of low LD, is a major source of the still missing heritability of complex traits and disease.
- Published
- 2019
- Full Text
- View/download PDF
6. Eigenanalysis of SNP data with an identity by descent interpretation
- Author
-
Bruce S. Weir and Xiuwen Zheng
- Subjects
0301 basic medicine ,Linkage disequilibrium ,IBD ,SNP ,Genome-wide association study ,HapMap Project ,Admixture ,Biology ,Identity by descent ,Article ,Consanguinity ,03 medical and health sciences ,Bayes' theorem ,Gene Frequency ,Population Groups ,Statistics ,Quantitative Biology::Populations and Evolution ,Humans ,Coancestry ,International HapMap Project ,Ecology, Evolution, Behavior and Systematics ,Genetic association ,Principal Component Analysis ,PCA ,Models, Genetic ,Genetic Variation ,Bayes Theorem ,Quantitative Biology::Genomics ,Genetics, Population ,030104 developmental biology ,Evolutionary biology ,Principal component analysis ,Relatedness - Abstract
Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA based on relatedness measures, which are described by the probability that sets of genes are identical-by-descent (IBD). An approximately linear transformation between ancestral proportions (AP) of individuals with multiple ancestries and their projections onto the principal components is found.In addition, a new method of eigenanalysis “EIGMIX” is proposed to estimate individual ancestries. EIGMIX is a method of moments with computational efficiency suitable for millions of SNP data, and it is not subject to the assumption of linkage equilibrium. With the assumptions of multiple ancestries and their surrogate ancestral samples, EIGMIX is able to infer ancestral proportions (APs) of individuals. The methods were applied to the SNP data from the HapMap Phase 3 project and the Human Genome Diversity Panel. The APs of individuals inferred by EIGMIX are consistent with the findings of the program ADMIXTURE.In conclusion, EIGMIX can be used to detect population structure and estimate genome-wide ancestral proportions with a relatively high accuracy.
- Published
- 2016
- Full Text
- View/download PDF
7. Imputation-Based Genomic Coverage Assessments of Current Human Genotyping Arrays
- Author
-
Elizabeth W. Pugh, Kimberly F. Doheny, Sarah C. Nelson, Cathy C. Laurie, Sharon R. Browning, Bruce S. Weir, Hua Ling, Jane Romm, and Cecelia A. Laurie
- Subjects
Linkage disequilibrium ,Genotyping Techniques ,Genome-wide association study ,Computational biology ,Investigations ,Biology ,Sensitivity and Specificity ,power ,03 medical and health sciences ,0302 clinical medicine ,Gene Frequency ,SNP microarrays ,Genetics ,Humans ,1000 Genomes Project ,Molecular Biology ,Allele frequency ,Genotyping ,Genetics (clinical) ,Oligonucleotide Array Sequence Analysis ,030304 developmental biology ,0303 health sciences ,genome-wide association study ,Genome, Human ,genomic coverage ,Minor allele frequency ,030217 neurology & neurosurgery ,Imputation (genetics) - Abstract
Microarray single-nucleotide polymorphism genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. “Genomic coverage” is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imputation. We have performed imputation-based genomic coverage assessments of eight current genotyping arrays that assay from ~0.3 to ~5 million variants. Coverage was determined separately in each of the four continental ancestry groups in the 1000 Genomes Project phase 1 release. We used the subset of 1000 Genomes variants present on each array to impute the remaining variants and assessed coverage based on correlation between imputed and observed allelic dosages. More than 75% of common variants (minor allele frequency > 0.05) are covered by all arrays in all groups except for African ancestry, and up to ~90% in all ancestries for the highest density arrays. In contrast, less than 40% of less common variants (0.01 < minor allele frequency < 0.05) are covered by low density arrays in all ancestries and 50–80% in high density arrays, depending on ancestry. We also calculated genome-wide power to detect variant-trait association in a case-control design, across varying sample sizes, effect sizes, and minor allele frequency ranges, and compare these array-based power estimates with a hypothetical array that would type all variants in 1000 Genomes. These imputation-based genomic coverage and power analyses are intended as a practical guide to researchers planning genetic studies.
- Published
- 2013
- Full Text
- View/download PDF
8. Detecting Coevolution through Allelic Association between Physically Unlinked Loci
- Author
-
Rori V. Rohlfs, Willie J. Swanson, and Bruce S. Weir
- Subjects
Linkage disequilibrium ,Zona pellucida glycoprotein ,Genotype ,Population ,Egg protein ,Receptors, Cell Surface ,Biology ,Zona Pellucida Glycoproteins ,Linkage Disequilibrium ,Article ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,Humans ,Genetics(clinical) ,Allele ,education ,Alleles ,Genetics (clinical) ,Selection (genetic algorithm) ,030304 developmental biology ,Linkage (software) ,0303 health sciences ,education.field_of_study ,Membrane Glycoproteins ,Polymorphism, Genetic ,Egg Proteins ,030217 neurology & neurosurgery - Abstract
Coevolving interacting genes undergo complementary mutations to maintain their interaction. Distinct combinations of alleles in coevolving genes interact differently, conferring varying degrees of fitness. If this fitness differential is adequately large, the resulting selection for allele matching could maintain allelic association, even between physically unlinked loci. Allelic association is often observed in a population with the use of gametic linkage disequilibrium. However, because the coevolving genes are not necessarily in physical linkage, this is not an appropriate measure of coevolution-induced allelic association. Instead, we propose using both composite linkage disequilibrium (CLD) and a measure of association between genotypes, which we call genotype association (GA). Using a simple selective model, we simulated loci and calculated power for tests of CLD and GA, showing that the tests can detect the allelic association expected under realistic selective pressure. We apply CLD and GA tests to the polymorphic, physically unlinked, and putatively coevolving human gamete-recognition genes ZP3 and ZP3R. We observe unusual allelic association, not attributable to population structure, between ZP3 and ZP3R. This study shows that selection for allele matching can drive allelic association between unlinked loci in a contemporary human population, and that selection can be detected with the use of CLD and GA tests. The observation of this selection is surprising, but reasonable in the highly selected system of fertilization. If confirmed, this sort of selection provides an exception to the paradigm of chromosomal independent assortment.
- Published
- 2010
- Full Text
- View/download PDF
9. Correlation-Based Inference for Linkage Disequilibrium With Multiple Alleles
- Author
-
Alexander I. Pudovkin, Dmitri V. Zaykin, and Bruce S. Weir
- Subjects
Linkage disequilibrium ,Genotype ,Investigations ,Biology ,Polymorphism, Single Nucleotide ,Linkage Disequilibrium ,Gene Frequency ,Genetics ,Chi-square test ,Test statistic ,Animals ,Humans ,Alleles ,Statistic ,Contingency table ,Stochastic Processes ,Models, Statistical ,Models, Genetic ,Chromosome Mapping ,Models, Theoretical ,Exact test ,Haplotypes ,Sample size determination ,Total correlation ,Monte Carlo Method ,Software - Abstract
The correlation between alleles at a pair of genetic loci is a measure of linkage disequilibrium. The square of the sample correlation multiplied by sample size provides the usual test statistic for the hypothesis of no disequilibrium for loci with two alleles and this relation has proved useful for study design and marker selection. Nevertheless, this relation holds only in a diallelic case, and an extension to multiple alleles has not been made. Here we introduce a similar statistic, R2, which leads to a correlation-based test for loci with multiple alleles: for a pair of loci with k and m alleles, and a sample of n individuals, the approximate distribution of n(k – 1)(m – 1)/(km)R2 under independence between loci is \documentclass[10pt]{article} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \begin{equation*}{\mathrm{{\chi}}}_{(k-1)(m-1)}^{2}\end{equation*}\end{document}. One advantage of this statistic is that it can be interpreted as the total correlation between a pair of loci. When the phase of two-locus genotypes is known, the approach is equivalent to a test for the overall correlation between rows and columns in a contingency table. In the phase-known case, R2 is the sum of the squared sample correlations for all km 2 × 2 subtables formed by collapsing to one allele vs. the rest at each locus. We examine the approximate distribution under the null of independence for R2 and report its close agreement with the exact distribution obtained by permutation. The test for independence using R2 is a strong competitor to approaches such as Pearson's chi square, Fisher's exact test, and a test based on Cressie and Read's power divergence statistic. We combine this approach with our previous composite-disequilibrium measures to address the case when the genotypic phase is unknown. Calculation of the new multiallele test statistic and its P-value is very simple and utilizes the approximate distribution of R2. We provide a computer program that evaluates approximate as well as “exact” permutational P-values.
- Published
- 2008
- Full Text
- View/download PDF
10. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation
- Author
-
Stephanie M. Gogarten, Sarah C. Nelson, Jan Graffelman, Bruce S. Weir, Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa, and Universitat Politècnica de Catalunya. COSDA-UPC - COmpositional and Spatial Data Analysis
- Subjects
Linkage disequilibrium ,Biometry ,Disequilibrium ,Inference ,imputation ,Biology ,Investigations ,01 natural sciences ,Linkage Disequilibrium ,010104 statistics & probability ,03 medical and health sciences ,missing data ,exact test ,Statistics ,Genetics ,medicine ,Statistical inference ,Inbreeding ,Imputation (statistics) ,0101 mathematics ,Molecular Biology ,Genetics (clinical) ,030304 developmental biology ,0303 health sciences ,Models, Genetic ,Matemàtiques i estadística [Àrees temàtiques de la UPC] ,Hardy−Weinberg equilibrium ,Missing data ,Hardy–Weinberg principle ,Quantitative Biology::Genomics ,Data Accuracy ,Exact test ,Genetics, Population ,medicine.symptom ,Estadística mèdica ,Algorithms - Abstract
This paper addresses the issue of exact-test based statistical inference for Hardy−Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy−Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy−Weinberg disequilibrium. Depending on the imputation method used, 6−13% of the test results changed qualitatively at the 5% level.
- Published
- 2015
11. A haplotype map of the human genome
- Author
-
Mark Leppert, Aravinda Chakravarti, Charmaine D.M. Royal, Sarah S. Murray, Renzong Qiu, Panos Deloukas, Renwu Wang, David A. Hinds, Barbara E. Stranger, Xiaoli Tang, Huanming Yang, John W. Belmont, Nigel P. Carter, Huy Nguyen, William Mak, Kazuto Kato, Shiran Pasternak, Chaohua Li, Jeffrey C. Barrett, Lon R. Cardon, Vincent Ferretti, Atsushi Nagashima, Peter E. Chen, Stephen F. Schaffner, Hongbo Fu, Zhu Chen, Siqi Liu, John Burton, Paul Hardenbol, Gudmundur A. Thorisson, Yusuke Nakamura, Mark Griffiths, Imtiaz Yakub, Eiko Suda, Gonçalo R. Abecasis, Carl S. Kashuk, Qingrun Zhang, Yoshimitsu Fukushima, Karen Kennedy, Sarah E. Hunt, Yi Wang, Norio Niikawa, Ichiro Matsuda, Lynn F. Zacharia, Lalitha Krishnan, Zhen Wang, Stéphanie Roumy, C M Clee, David J. Cutler, Albert V. Smith, Lincoln Stein, Simon Myers, Jane Peterson, Jun Zhou, Yozo Ohnishi, Weihua Guan, Matthew Stephens, Xiaoyan Xiong, Julian Maller, Houcan Zhang, Pui-Yan Kwok, Mark S. Guyer, Liuda Ziaugra, Jonathan Witonsky, Matthew C. Jones, Stacey Gabriel, You-Qiang Song, Daochang An, Haifeng Wang, Gilean McVean, Lawrence M. Sung, Zhijian Yao, Yan Shen, Yangfan Liu, George M. Weinstock, Ludmila Pawlikowska, Erica Sodergren, Mark T. Ross, Andrew Boudreau, Toshihiro Tanaka, Thomas D. Willis, Weitao Hu, Kelly A. Frazer, Li Jin, Robert W. Plumb, Paul I.W. de Bakker, Hongbin Zhao, Wei Lin, Sarah Sims, Richard A. Gibbs, Maura Faggart, Michael Feolo, Dennis G. Ballinger, Xun Chu, Lucinda Fulton, Marcos Delgado, Ellen Winchester, Wei Huang, Fuli Yu, Christianne R. Bird, Shaun Purcell, Jessica Roy, Dongmei Cai, Launa M. Galver, Bartha Maria Knoppers, Emmanouil T. Dermitzakis, Gao Yang, Takashi Morizono, Rachel Barry, Kirsten McLay, Daryl J. Thomas, Steve McCarroll, Jonathan Marchini, Daniel J. Richter, Andy Peiffer, Patricia Taillon-Miller, Richard K. Wilson, Stephen Kwok-Wing Tsui, Jian-Bing Fan, Lisa D. Brooks, Laura L. Stuve, Paul L'Archevêque, David M. Evans, Clémentine Sallée, Peter Donnelly, Hong Xue, Hui Zhao, Charles N. Rotimi, Jean E. McEwen, J. Tze Fei Wong, Hao Pan, Alastair Kent, Brendan Blumenstiel, Qing Li, Weiwei Sun, L. Kang, Colin Freeman, John Stewart, Chibuzor Nkwodimmah, Morris W. Foster, Don Powell, Leonardo Bottolo, Raymond D. Miller, Stephen T. Sherry, Francis S. Collins, Donna M. Muzny, Jun Yu, Ike Ajayi, Hua Han, Pardis C. Sabeti, Hongguang Wang, Takahisa Kawaguchi, Tatsuhiko Tsunoda, Guy Bellemare, Zhaohui S. Qin, H. B. Hu, Jane Rogers, Thomas J. Hudson, Mark J. Daly, Andrew P. Morris, Supriya Gupta, Ming Xiao, Patrick Varilly, Nick Patterson, Akihiro Sekine, Chris C. A. Spencer, Jonathan Morrison, Missy Dixon, Paul K.H. Tam, Jian Wang, Matthew Defelice, Susana Eyheramendy, Michael Shi, Yungang He, Ellen Wright Clayton, Richa Saxena, Heather M. Munro, Arthur L. Holden, Yayun Shen, Christine P. Bird, Bruce W. Birren, Itsik Pe'er, David R. Bentley, Lynne V. Nazareth, Pamela Whittaker, Pak C. Sham, Amy L. Camargo, David A. Wheeler, Koji Saeki, Martin Godbout, David Altshuler, Liang Xu, Ying Wang, David Willey, Alexandre Montpetit, Shin Lin, Michael S. Phillips, Changqing Zeng, Clement Adebamowo, John C. Wallenburg, Mark S. Chee, Ben Fry, Erich Stahl, Melissa Parkin, Rhian Gwilliam, Andrei Verner, Patrick J. Nailer, Lap-Chee Tsui, Bo Zhang, Fanny Chagnon, David R. Cox, Jack Spiegel, Jamie Moore, Vivian Ota Wang, Patricia A. Marshall, Takuya Kitamoto, Bruce S. Weir, Darryl Macer, Geraldine M. Clarke, Robert C. Onofrio, Mary M.Y. Waye, Wei Wang, Suzanne M. Leal, James C. Mullikin, Toyin Aniagwu, Daniel C. Koboldt, Mary Goyette, Martin Leboeuf, Isaac F. Adewole, Ruth Jamieson, Arnold Oliphant, Jessica Watkin, and Jean François Olivier
- Subjects
Linkage disequilibrium ,Biology ,DNA, Mitochondrial ,Polymorphism, Single Nucleotide ,Article ,Linkage Disequilibrium ,Structural variation ,Gene Frequency ,Humans ,Selection, Genetic ,International HapMap Project ,Genetic association ,Haplotypes - genetics ,Recombination, Genetic ,Genetics ,Chromosomes, Human, Y ,Multidisciplinary ,Genome, Human ,DNA, Mitochondrial - genetics ,Haplotype ,Tag SNP ,Polymorphism, Single Nucleotide - genetics ,Haplotypes ,Human genome ,Haplotype estimation ,Chromosomes, Human, Y - genetics - Abstract
Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution. © 2005 Nature Publishing Group., link_to_OA_fulltext
- Published
- 2005
- Full Text
- View/download PDF
12. Affected Sib Pair Tests in Inbred Populations
- Author
-
W. Liu and Bruce S. Weir
- Subjects
Genetics ,education.field_of_study ,Linkage disequilibrium ,Population ,Locus (genetics) ,Biology ,Statistics ,Cutoff ,False positive rate ,education ,Null hypothesis ,Inbreeding ,Genetics (clinical) ,Statistical hypothesis testing - Abstract
Summary The affected-sib-pair (ASP) method for detecting linkage between a disease locus and marker loci was first established 50 years ago, and since then numerous modifications have been made. We modify two identity-by-state (IBS) test statistics of Lange (Lange, 1986a, 1986b) to allow for inbreeding in the population. We evaluate the power and false positive rates of the modified tests under three disease models, using simulated data. Before estimating false positive rates, we demonstrate that IBS tests are tests of both linkage and linkage disequilibrium between marker and disease loci. Therefore, the null hypothesis of IBS tests should be no linkage and no LD. When the population inbreeding coefficient is large, the false positive rates of Lange's tests become much larger than the nominal value, while those of our modified tests remain close to the nominal value. To estimate power with a controlled false positive rate, we choose the cutoff values based on simulated datasets under the null hypothesis, so that both Lange's tests and the modified tests generate same false positive rate. The powers of Lange's z-test and our modified z-test are very close and do not change much with increasing inbreeding. The power of the modified chi-square test also stays stable when the inbreeding coefficient increases. However, the power of Lange's chi-square test increases with increasing inbreeding, and is larger than that of our modified chi-square test for large inbreeding coefficients. The power is high under a recessive disease model for both Lange's tests and the modified tests, though the power is low for additive and dominant disease models. Allowing for inbreeding is therefore appropriate, at least for diseases known to be recessive.
- Published
- 2004
- Full Text
- View/download PDF
13. Properties of the Multiallelic Trend Test
- Author
-
Wendy Czika and Bruce S. Weir
- Subjects
Genetic Markers ,Statistics and Probability ,Linkage disequilibrium ,Disease status ,Biometry ,Population ,Multiple alleles ,Biology ,Linkage Disequilibrium ,General Biochemistry, Genetics and Molecular Biology ,Statistics ,Econometrics ,Humans ,Allele ,education ,Alleles ,Disease gene ,education.field_of_study ,Chi-Square Distribution ,Models, Genetic ,General Immunology and Microbiology ,Applied Mathematics ,General Medicine ,Trend analysis ,Genetic marker ,Case-Control Studies ,General Agricultural and Biological Sciences - Abstract
Summary. Disease genes can be mapped on the basis of associations between genetic markers and disease status, with the case–control design having the advantage of not requiring individuals from different generations. When the marker loci have multiple alleles, there has been debate on whether the power of tests for association increases or decreases. We show here that the multiple-allele version of Armitage's trend test has increased power over the two-allele version under the requirement of equifrequent alleles, but not in general. The trend test has the advantage of remaining valid even when the sampled population is not in Hardy–Weinberg equilibrium. A departure from Hardy–Weinberg means that association tests depend on gametic and nongametic linkage disequilibrium between marker and disease loci, and we illustrate the magnitude of these effects with simulated data.
- Published
- 2004
- Full Text
- View/download PDF
14. Allelic association patterns for a dense SNP map
- Author
-
Bruce S. Weir, Lon R. Cardon, and William G. Hill
- Subjects
Linkage disequilibrium ,Genotype ,Epidemiology ,Population ,Chromosomes, Human, Pair 20 ,Single-nucleotide polymorphism ,Biology ,Polymorphism, Single Nucleotide ,Linkage Disequilibrium ,White People ,Asian People ,Humans ,Genetic Predisposition to Disease ,Association mapping ,education ,Genotyping ,Alleles ,Genetics (clinical) ,Genetic association ,Genetics ,education.field_of_study ,Models, Statistical ,Models, Genetic ,Genome, Human ,Haplotype ,Chromosome Mapping ,Tag SNP ,Black or African American ,Genetics, Population ,Haplotypes ,Case-Control Studies - Abstract
A dense set of 5,000 SNPs on a 10-Mb region of human chromosome 20 has been typed on samples of African Americans, East Asians, and United Kingdom Caucasians. There are departures from Hardy-Weinberg equilibrium beyond the level at which markers are often discarded because of possible genotyping errors. The observation that markers showing such departures are often close together on the chromosome confirms the result that Hardy-Weinberg tests at two loci are correlated to an extent that depends on the linkage disequilibrium between those two markers. Linkage disequilibrium can be described by the composite linkage disequilibrium coefficient, the parameter that determines the behavior of case-control allelic tests of association. A useful preliminary investigation of datasets of this type is provided by counting the numbers of distinct multi-locus genotypes in windows of a few markers.
- Published
- 2004
- Full Text
- View/download PDF
15. Association Studies under General Disease Models
- Author
-
Bruce S. Weir and Dahlia M. Nielsen
- Subjects
Genetic Markers ,Genetics ,Candidate gene ,Linkage disequilibrium ,Genotype ,Models, Genetic ,Chromosome Mapping ,Locus (genetics) ,Tag SNP ,Biology ,Linkage Disequilibrium ,Apolipoproteins E ,Phenotype ,Cytochrome P-450 CYP2D6 ,Haplotypes ,Humans ,Genetic Predisposition to Disease ,Allele ,Association mapping ,Allele frequency ,Alleles ,Ecology, Evolution, Behavior and Systematics ,Genetic association - Abstract
There is great expectation that the levels of association found between genetic markers and disease status will play a role in the location of disease genes. This expectation follows from regarding association as being proportional to linkage disequilibrium and therefore inversely related to recombination value. For disease genes with more than two alleles, the association measure is instead a weighted average of linkage disequilibria, with the weights depending on allele frequencies and genotype susceptibilities at the disease loci. There is no longer a simple relationship, even in expectation, with recombination. We adopt a general framework to examine association mapping methods which helps to clarify the nature of case-control and transmission/disequilibrium-type tests and reveals the relationship between measures of association and coefficients of linkage disequilibrium. In particular, we can show the consequences of additive and nonadditive effects at the trait locus on the behavior of these tests. These concepts have a natural extension to marker haplotypes. The association of two-locus marker haplotypes with disease phenotype depends on a weighted average of three-locus disequilibria (two markers with each disease locus). It is likely that these two-marker analyses will provide additional information in association mapping studies.
- Published
- 2001
- Full Text
- View/download PDF
16. A Comparative Study of Sibship Tests of Linkage and/or Association
- Author
-
Norman L. Kaplan, Bruce S. Weir, and S.A. Monks
- Subjects
Genetic Markers ,Parents ,Linkage disequilibrium ,Genetic Linkage ,Population ,Context (language use) ,Disease ,Biology ,Nuclear Family ,Association ,Family-based tests ,Genetics ,False Positive Reactions ,Genetics(clinical) ,education ,Nuclear family ,Alleles ,Genetics (clinical) ,Linkage (software) ,education.field_of_study ,Models, Statistical ,Models, Genetic ,Transmission/disequilibrium test ,Linkage ,Genetic Diseases, Inborn ,Case-control study ,Transmission disequilibrium test ,Research Design ,Case-Control Studies ,Power study ,Monte Carlo Method ,Research Article ,Demography - Abstract
SummaryPopulation-based tests of association have used data from either case-control studies or studies based on trios (affected child and parents). Case-control studies are more prone to false-positive results caused by inappropriate controls, which can occur if, for example, there is population admixture or stratification. An advantage of family-based tests is that cases and controls are well matched, but parental data may not always be available, especially for late-onset diseases. Three recent family-based tests of association and linkage utilize unaffected siblings as surrogates for untyped parents. In this paper, we propose an extension of one of these tests. We describe and compare the four tests in the context of a complex disease for both biallelic and multiallelic markers, as well as for sibships of different sizes. We also examine the consequences of having some parental data in the sample.
- Published
- 1998
- Full Text
- View/download PDF
17. Tests for Linkage and Association in Nuclear Families
- Author
-
Norman L. Kaplan, Bruce S. Weir, and Eden R. Martin
- Subjects
Genetic Markers ,Linkage disequilibrium ,Genetic Linkage ,Locus (genetics) ,Biology ,Linkage Disequilibrium ,Nuclear Family ,03 medical and health sciences ,0302 clinical medicine ,Gene mapping ,Genetic linkage ,Chi-square test ,Genetics ,Humans ,Genetic Predisposition to Disease ,Genetics(clinical) ,Allele ,Child ,Nuclear family ,Alleles ,Genetics (clinical) ,030304 developmental biology ,0303 health sciences ,Chi-Square Distribution ,Models, Genetic ,Genetic Diseases, Inborn ,Reproducibility of Results ,Transmission disequilibrium test ,stomatognathic diseases ,Monte Carlo Method ,030217 neurology & neurosurgery ,Research Article - Abstract
Summary The transmission/disequilibrium test (TDT) originally was introduced to test for linkage between a genetic marker and a disease-susceptibility locus, in the presence of association. Recently, the TDT has been used to test for association in the presence of linkage. The motivation for this is that linkage analysis typically identifies large candidate regions, and further refinement is necessary before a search for the disease gene is begun, on the molecular level. Evidence of association and linkage may indicate which markers in the region are closest to a disease locus. As a test of linkage, transmissions from heterozygous parents to all of their affected children can be included in the TDT; however, the TDT is a valid x 2 test of association only if transmissions to unrelated affected children are used in the analysis. If the sample contains independent nuclear families with multiple affected children, then one procedure that has been used to test for association is to select randomly a single affected child from each sibship and to apply the TDT to those data. As an alternative, we propose two statistics that use data from all of the affected children. The statistics give valid x 2 tests of the null hypothesis of no association or no linkage and generally are more powerful than the TDT with a single, randomly chosen, affected child from each family.
- Published
- 1997
- Full Text
- View/download PDF
18. Interpreting Whole-Genome Marker Data
- Author
-
Bruce S. Weir
- Subjects
Statistics and Probability ,Genetics ,Linkage disequilibrium ,Genetic marker ,Statistical genetics ,Genotype ,Single-nucleotide polymorphism ,Biology ,Association mapping ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,Genome ,Hardy–Weinberg principle ,Article - Abstract
The challenges of whole-genome data, when genotypes are available from hundreds of thousands of genetic markers, are explored for four topics in statistical genetics: Hardy-Weinberg testing, estimating linkage disequilibrium from unphased genotypic data, association mapping and characterizing population structure.
- Published
- 2013
19. Genetic Markers in Clinical Trials
- Author
-
Bruce S. Weir and P. J. Heagerty
- Subjects
Linkage disequilibrium ,Genetic marker ,SNP ,Single-nucleotide polymorphism ,Human genome ,Quantitative genetics ,Computational biology ,Biology ,Association mapping ,Regression - Abstract
The current availability of dense sets of marker SNPs for the human genome is having a large impact on genetic studies and offers new possibilities for clinical trials. This chapter offers a unified basis for the analysis of marker and response data, emphasizing the central importance of the correlation, or linkage disequilibrium, between SNP markers and the genes that affect response. It is convenient to phrase the development of association mapping in the language of quantitative genetics, using additive and non-additive components of variance. A novel feature of dense SNP data is that good estimates can be made of actual inbreeding and relatedness. These estimates are more relevant than values predicted from family pedigree, and are all that are available in the absence of family data.The dimensionality of SNP marker datasets has required the development of new methods that are appropriate for a large number of statistical comparisons, and the development of computational methods that allow high-dimensional regression. These methods are reviewed here, as is the use of biological annotation for both viewing the relevance of empirical associations, and to structure analysis in order to focus on those markers with the highest expectation for association with the outcomes under study.
- Published
- 2012
- Full Text
- View/download PDF
20. Distributions of Hardy-Weinberg equilibrium test statistics
- Author
-
Rori V. Rohlfs and Bruce S. Weir
- Subjects
Genetics ,Models, Statistical ,Models, Genetic ,Null (mathematics) ,Binomial test ,Biology ,Investigations ,Kolmogorov–Smirnov test ,Polymorphism, Single Nucleotide ,Linkage Disequilibrium ,Exact test ,symbols.namesake ,Case-Control Studies ,Test statistic ,Null distribution ,Chi-square test ,symbols ,Probability distribution ,Applied mathematics ,Humans ,Genome-Wide Association Study ,Statistical Distributions - Abstract
It is well established that test statistics and P-values derived from discrete data, such as genetic markers, are also discrete. In most genetic applications, the null distribution for a discrete test statistic is approximated with a continuous distribution, but this approximation may not be reasonable. In some cases using the continuous approximation for the expected null distribution may cause truly null test statistics to appear nonnull. We explore the implications of using continuous distributions to approximate the discrete distributions of Hardy–Weinberg equilibrium test statistics and P-values. We derive exact P-value distributions under the null and alternative hypotheses, enabling a more accurate analysis than is possible with continuous approximations. We apply these methods to biological data and find that using continuous distribution theory with exact tests may underestimate the extent of Hardy–Weinberg disequilibrium in a sample. The implications may be most important for the widespread use of whole-genome case–control association studies and Hardy–Weinberg equilibrium (HWE) testing for data quality control.
- Published
- 2008
21. Linkage disequilibrium in wild mice
- Author
-
Deborah A. Nickerson, Amy D. Anderson, Robert J. Livingston, Cathy C. Laurie, Matthew D. Dean, Kimberly L. Smith, Bruce S. Weir, Michael W. Nachman, and Eric E. Schadt
- Subjects
0106 biological sciences ,Linkage disequilibrium ,Cancer Research ,lcsh:QH426-470 ,Molecular Sequence Data ,Quantitative Trait Loci ,Animals, Wild ,Quantitative trait locus ,Biology ,010603 evolutionary biology ,01 natural sciences ,Linkage Disequilibrium ,03 medical and health sciences ,Mice ,0302 clinical medicine ,Gene mapping ,Inbred strain ,Genetic variation ,Genetics ,Animals ,Association mapping ,Molecular Biology ,Genetics (clinical) ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,Genetic association ,030304 developmental biology ,0303 health sciences ,Evolutionary Biology ,Arizona ,Genetic Variation ,Genetics and Genomics ,Mus (Mouse) ,3. Good health ,lcsh:Genetics ,Inbreeding ,030217 neurology & neurosurgery ,Research Article - Abstract
Crosses between laboratory strains of mice provide a powerful way of detecting quantitative trait loci for complex traits related to human disease. Hundreds of these loci have been detected, but only a small number of the underlying causative genes have been identified. The main difficulty is the extensive linkage disequilibrium (LD) in intercross progeny and the slow process of fine-scale mapping by traditional methods. Recently, new approaches have been introduced, such as association studies with inbred lines and multigenerational crosses. These approaches are very useful for interval reduction, but generally do not provide single-gene resolution because of strong LD extending over one to several megabases. Here, we investigate the genetic structure of a natural population of mice in Arizona to determine its suitability for fine-scale LD mapping and association studies. There are three main findings: (1) Arizona mice have a high level of genetic variation, which includes a large fraction of the sequence variation present in classical strains of laboratory mice; (2) they show clear evidence of local inbreeding but appear to lack stable population structure across the study area; and (3) LD decays with distance at a rate similar to human populations, which is considerably more rapid than in laboratory populations of mice. Strong associations in Arizona mice are limited primarily to markers less than 100 kb apart, which provides the possibility of fine-scale association mapping at the level of one or a few genes. Although other considerations, such as sample size requirements and marker discovery, are serious issues in the implementation of association studies, the genetic variation and LD results indicate that wild mice could provide a useful tool for identifying genes that cause variation in complex traits., Author Summary Linkage disequilibrium (LD) refers to the nonrandom association of variants at different sites in the genome. In recent years, LD has been of great interest in biomedical research because of its utility in “association studies,” where DNA sequence variants associated with disease traits are used to identify susceptibility genes. The resolution of this gene-finding tool depends on how the LD decays with distance between the associated sites. The pattern of LD decay is well known in human populations, where it provides high resolution on the order of one or a few genes. This paper shows that the pattern of LD in wild house mice (in contrast to laboratory mice) is very similar to that in human populations. This result means that wild mice (reared in the laboratory) could be used in association studies to identify genes that cause trait variation. Wild mouse association studies might complement those in humans by dealing with traits that are difficult to measure in humans (such as response to carcinogen exposure) and by filtering human associations for subsequent validation with genetically engineered mouse models.
- Published
- 2007
22. A population-based latent variable approach for association mapping of quantitative trait loci
- Author
-
Zhao-Bang Zeng, Bruce S. Weir, and Tao Wang
- Subjects
Genetics ,education.field_of_study ,Likelihood Functions ,Models, Genetic ,Population ,Quantitative Trait Loci ,Statistics as Topic ,food and beverages ,Chromosome Mapping ,Biology ,Quantitative trait locus ,Genetic architecture ,Linkage Disequilibrium ,Genetics, Population ,Family-based QTL mapping ,Linkage based QTL mapping ,Inclusive composite interval mapping ,Multiple comparisons problem ,Humans ,Computer Simulation ,Association mapping ,education ,Genetics (clinical) ,Algorithms - Abstract
A population-based latent variable approach is proposed for association mapping of quantitative trait loci (QTL), using multiple closely linked genetic markers within a small candidate region in the genome. By incorporating QTL as latent variables into a penetrance model, the QTL are flexible to characterize either alleles at putative trait loci or potential risk haplotypes/sub-haplotypes of the markers. Under a general likelihood framework, we develop an EM-based algorithm to estimate genetic effects of the QTL and haplotype frequencies of the QTL and markers jointly. Closed form solutions derived in the maximization step of the EM procedure for updating the joint haplotype frequencies of QTL and markers can effectively reduce the computational intensity. Various association measures between QTL and markers can then be derived from the haplotype frequencies of markers and used to infer QTL positions. The likelihood ratio statistic also provides a joint test for association between a quantitative trait and marker genotypes without requiring adjustment for the multiple testing. Extensive simulation studies are performed to evaluate the approach.
- Published
- 2006
23. Evaluation of DNA pooling for the estimation of microsatellite allele frequencies: a case study using striped bass (Morone saxatilis)
- Author
-
Garrick T. Skalski, Bruce S. Weir, Charlene R. Couch, Amber F. Garber, and Craig V. Sullivan
- Subjects
Genetics ,Linkage disequilibrium ,Biometry ,Pooling ,Electrophoresis, Capillary ,Locus (genetics) ,DNA ,Biology ,Investigations ,Polymerase Chain Reaction ,Linkage Disequilibrium ,law.invention ,Gene Frequency ,law ,Microsatellite ,Animals ,Bass ,Allele ,Genotyping ,Allele frequency ,Polymerase chain reaction ,Alleles ,Microsatellite Repeats - Abstract
Using striped bass (Morone saxatilis) and six multiplexed microsatellite markers, we evaluated procedures for estimating allele frequencies by pooling DNA from multiple individuals, a method suggested as cost-effective relative to individual genotyping. Using moment-based estimators, we estimated allele frequencies in experimental DNA pools and found that the three primary laboratory steps, DNA quantitation and pooling, PCR amplification, and electrophoresis, accounted for 23, 48, and 29%, respectively, of the technical variance of estimates in pools containing DNA from 2–24 individuals. Exact allele-frequency estimates could be made for pools of sizes 2–8, depending on the locus, by using an integer-valued estimator. Larger pools of size 12 and 24 tended to yield biased estimates; however, replicates of these estimates detected allele frequency differences among pools with different allelic compositions. We also derive an unbiased estimator of Hardy–Weinberg disequilibrium coefficients that uses multiple DNA pools and analyze the cost-efficiency of DNA pooling. DNA pooling yields the most potential cost savings when a large number of loci are employed using a large number of individuals, a situation becoming increasingly common as microsatellite loci are developed in increasing numbers of taxa.
- Published
- 2006
24. Prediction of multi-locus inbreeding coefficients and relation to linkage disequilibrium in random mating populations
- Author
-
Bruce S. Weir and William G. Hill
- Subjects
Linkage disequilibrium ,education.field_of_study ,Genetic Linkage ,Population ,Population Dynamics ,food and beverages ,Locus (genetics) ,Biology ,Identity by descent ,Quantitative Biology::Genomics ,Article ,Consanguinity ,Genetics, Population ,Gene mapping ,Evolutionary biology ,Genetic linkage ,Quantitative Biology::Populations and Evolution ,Humans ,Association mapping ,education ,Inbreeding ,Ecology, Evolution, Behavior and Systematics ,Computer Science::Databases ,Algorithms ,Forecasting - Abstract
An algorithm to predict the level of identity by descent simultaneously at multiple loci is presented, which can in principle be extended to any number of loci. The model assumes a random mating population, with random association of haplotypes. The relationship is shown between coefficients of multi-locus identity or non-identity by descent and moments of multi-locus linkage disequilibrium. Thus, these moments can be computed from the multilocus identity or, using algorithms derived previously to predict the disequilibria moments, vice-versa. The results can be applied to predict multi-locus identity in, for example, gene mapping.
- Published
- 2005
25. Measures of human population structure show heterogeneity among genomic regions
- Author
-
Amy D. Anderson, Dahlia M. Nielsen, Bruce S. Weir, William G. Hill, and Lon R. Cardon
- Subjects
Population ,Biology ,Polymorphism, Single Nucleotide ,Linkage Disequilibrium ,Genetic Heterogeneity ,Effective population size ,Genetic drift ,Genetics ,Ethnicity ,Chromosomes, Human ,Humans ,Sample variance ,International HapMap Project ,education ,Allele frequency ,Genetics (clinical) ,education.field_of_study ,Genome, Human ,Genomics ,Articles ,United States ,Genetics, Population ,Haplotypes ,Evolutionary biology ,F-statistics ,Genetic structure - Abstract
Publication of the Perlegen SNP data set (Hinds et al. 2005) and completion of Phase I of the International HapMap Project (The International HapMap Consortium 2005) have allowed a new perspective on the genetic structure of human populations. These two whole-genome data sets allow population genetic analyses at an unprecedented scale: Previous estimates of genetic population structure (for review, see Garte 2003) have been based on a limited number of loci and provided only average figures of quantities such as FST (Wright 1951) across the whole genome. The precision of previous estimates is not high, and they relate only to specific genes rather than to the region in which the markers are located. We can expect there to be some diversity in the magnitude of population structure between regions of the genome because the precise genealogy is not the same for each chromosome or part thereof, with values becoming increasingly similar the more closely linked are the regions. The genealogy can differ both by random events and by non-random events such as selection. Strong selection at a locus will induce hitch-hiking of nearby regions (Maynard Smith and Haigh 1974), leading to both a reduction in heterozygosity within populations and an increase in diversity between populations as measured by FST. Examination of the differences in diversity between regions therefore provides an opportunity to identify those that cannot be explained solely in terms of random sampling of the genealogy due to Mendelian segregation, variation in family size, migration, and recombination between genetic sites. Methods for estimating FST from samples of a group of populations are well established (e.g., Weir and Cockerham 1984). More recently they have been discussed for estimating values separately for each of a set of populations assumed to come from a common founder, but which may differ both in their times of divergence from each other and in the sizes of the populations (Weir and Hill 2002; Shriver et al. 2004). The stochastic nature of evolution means that the actual allele frequencies in a population differ from the expected values, and the population-specific FST describes the variance of allele frequencies about the means for that population. Because there is only one realization of the population, the variance is estimated from the allele frequencies of that population and at least one other population. The average of the population-specific values is the usual (population-average) FST, and its estimate is proportional to the sample variance in allele frequencies among the sampled populations. It serves as a measure of genetic differentiation of the populations, and, in the case of population divergence being due to genetic drift, the value for each pair of populations serves as a measure of time since diverging from an ancestral population. Because there is not replication of each of the populations studied, the population-specific and population-average values are relative to the value in their ancestral population. In this paper we compute values of FST from all autosomes in the Perlegen and HapMap data sets, but we use only those SNPs that were found to be segregating in all population samples within each data set. Our estimates are calculated for all markers separately and also for all markers in all the 5-Mb windows centered on each SNP in the autosomal genome. The numbers of markers used are shown in Table 1. We find substantial diversity in these measures, and we attempt to explain how much of this can be attributed to sampling of different kinds. We consider the data as a function of the number and choice of sites in the region, and as a function of the individuals that comprise the sample. We predict the variation in identity at individual regions and their covariance with other regions expected from the sampling in genealogy of the population. Further, we examine the results to reveal regions associated with known genes that have been under selection in one or more of the populations so as to consider the utility of FST measures in gene location or in detecting signatures of past selective events. Table 1. Chromosome lengths and numbers of markers segregating in all samples within a data set
- Published
- 2005
26. Bioinformatics and approaches to identifying polygenic susceptibility traits
- Author
-
Bruce S. Weir
- Subjects
Genetic Markers ,Linkage disequilibrium ,Multifactorial Inheritance ,Models, Genetic ,business.industry ,Genome, Human ,Dental Informatics ,Context (language use) ,Single-nucleotide polymorphism ,General Medicine ,Bioinformatics ,Polymorphism, Single Nucleotide ,Genetic architecture ,Linkage Disequilibrium ,Genetic epidemiology ,Haplotypes ,Statistical genetics ,Genetic marker ,Risk Factors ,Medicine ,Humans ,Genetic Predisposition to Disease ,business ,Periodontal Diseases - Abstract
The role of genetic factors in periodontal disease is now well recognized, although details for the genetic mechanisms of the disease and implications for therapy can be as obscure as they are for other human traits. This paper addresses the role that the analysis of genome-wide data might play in helping to understand the molecular determinants of periodontal risk. Very few human diseases are not polygenic, in that an individual's susceptibility depends on his or her constitution at many genetic loci, each of which may have a small effect. Not only do these loci interact, but also their actions and interactions depend on nongenetic factors. Much of the statistical machinery to handle this complexity was developed in the plant and animal breeding context, where crosses between inbred lines selected for trait differences could be conducted. Human polygenic studies began with studies on large pedigrees, but have expanded to include case-control analyses of random samples of individuals who differ in disease status, and studies of marker transmissions within nuclear families. In the area of characterizing the genetic architecture of complex traits, the relatively new field of bioinformatics is distinguished from the more mature fields of statistical genetics or genetic epidemiology by its focus on genome-wide data. The very dense sets of genetic markers now available, particularly those at single nucleotide positions (SNPs), have meant that it is possible to seek linkages or associations between chromosomal position and disease from the whole genome in a single study. Apart from the obvious problems of scale, there are real issues involved with multiple testing and recognizing interactions. Current thinking tends to focus on relatively conserved "haplotype blocks" instead of single genetic markers, although there is no consensus on the utility of this emphasis.
- Published
- 2005
27. Case‐Only Gene Mapping
- Author
-
Bruce S. Weir and D. M. Nielsen
- Subjects
Genetics ,Candidate gene ,education.field_of_study ,Linkage disequilibrium ,Haplotype ,Population ,Locus (genetics) ,Ancestry-informative marker ,Biology ,Gene mapping ,Genotype ,Allele ,Association mapping ,education ,Genetic association - Abstract
When individuals can be characterized as being affected or unaffected by a disease, marker-trait association can be addressed by comparing marker frequencies between these two categories. It is also possible to infer linkage or association on the basis of marker data from affected individuals only. When there is linkage disequilibrium a test for Hardy-Weinberg proportions among marker genotypes in the affected population is a test about marker allele and genotype associations with the disease locus. However, rejection Hardy-Weinberg at the marker locus in the affected population may simply reflect nonrandom mating in the whole population. Keywords: association; candidate gene; disease marker; linkage disequilibrium; haplotype; Hardy-Weinberg
- Published
- 2005
- Full Text
- View/download PDF
28. Effect of Two- and Three-Locus Linkage Disequilibrium on the Power to Detect Marker/Phenotype Associations
- Author
-
Dmitri V. Zaykin, Margaret G. Ehm, Bruce S. Weir, and Dahlia M. Nielsen
- Subjects
Genetics ,Genetic Markers ,Linkage disequilibrium ,Polymorphism, Genetic ,Genotype ,Haplotype ,Computational Biology ,Genetic Variation ,Locus (genetics) ,Biology ,Investigations ,Phenotype ,Linkage Disequilibrium ,Quantitative Trait, Heritable ,Haplotypes ,Genetic marker ,Case-Control Studies ,Genetic variation ,Humans ,Pairwise comparison - Abstract
There has been much recent interest in describing the patterns of linkage disequilibrium (LD) along a chromosome. Most empirical studies that have examined this issue have concentrated on LD between collections of pairs of markers and have not considered the joint effect of a group of markers beyond these pairwise connections. Here, we examine many different patterns of LD defined by both pairwise and joint multilocus LD terms. The LD patterns we considered were chosen in part by examining those seen in real data. We examine how changes in these patterns affect the power to detect association when performing single-marker and haplotype-based case-control tests, including a novel haplotype test based on contrasting LD between affected and unaffected individuals. Through our studies we find that differences in power between single-marker tests and haplotype-based tests in general do not appear to be large. Where moderate to high levels of multilocus LD exist, haplotype tests tend to be more powerful. Single-marker tests tend to prevail when pairwise LD is high. For moderate pairwise values and weak multilocus LD, either testing strategy may come out ahead, although it is also quite likely that neither has much power.
- Published
- 2004
29. Moment estimation of population diversity and genetic distance from data on recessive markers
- Author
-
William G. Hill and Bruce S. Weir
- Subjects
Genetics ,Genetic Markers ,Linkage disequilibrium ,Genetic diversity ,Analysis of Variance ,Genotype ,Models, Genetic ,Sus scrofa ,Genetic Variation ,Genes, Recessive ,Biology ,Genotype frequency ,Evolution, Molecular ,Genetics, Population ,Genetic distance ,Species Specificity ,F-statistics ,Statistics ,Mutation (genetic algorithm) ,Animals ,Allele frequency ,Ecology, Evolution, Behavior and Systematics ,Polymorphism, Restriction Fragment Length - Abstract
A moment-based method for estimating a measure of population diversity, theta or Wright's FST, is given for dominant markers such as amplified fragment length polymorphisms (AFLPs) or RAPDs in noninbred populations. Basic assumptions are that there is random mating, Hardy-Weinberg equilibrium, linkage equilibrium, no mutation from common ancestor and equally distant populations. It is based on the variances between and within populations of genotype frequencies, whereas previously moment methods for dominant markers have been indirect in that they have been based on first estimating allele frequencies and then using the variances of those frequencies. The use of genotype frequencies directly appears to be more robust. Approximate sampling errors of the estimates are given. Methods are extended to estimate genetic distances and their sampling errors. The AFLP data from samples of breeds of pig are used for illustration.
- Published
- 2004
30. Analysis of single nucleotide polymorphisms in candidate genes using the pedigree disequilibrium test
- Author
-
Norman L. Kaplan, Bruce S. Weir, Sarah Hardy, and Eden R. Martin
- Subjects
0301 basic medicine ,Linkage disequilibrium ,Candidate gene ,Genotype ,Epidemiology ,Population ,Pedigree chart ,Single-nucleotide polymorphism ,Computational biology ,030105 genetics & heredity ,Biology ,Polymorphism, Single Nucleotide ,Linkage Disequilibrium ,03 medical and health sciences ,Quantitative Trait, Heritable ,Humans ,Genetic Predisposition to Disease ,education ,Association mapping ,Genetics (clinical) ,Genetic association ,education.field_of_study ,Models, Genetic ,Tag SNP ,030104 developmental biology ,Genetics, Population - Abstract
The pedigree disequilibrium test (PDT) has been proposed recently as a test for association in general pedigrees [Martin et al., Am J Hum Genet 67:146-54, 2000]. The Genetic Analysis Workshop (GAW) 12 simulated data, with many extended pedigrees, is an example the type of data to which the PDT is ideally suited. In replicate 42 from the general population the PDT correctly identifies candidate genes 1, 2, and 6 as containing single nucleotide polymorphisms (SNPs) that are significantly associated with the disease. We also applied the truncated product method (TPM) [Zaykin et al., Genet Epidemiol, in press] to combine p-values in overlapping windows across the genes. Our results show that the TPM is helpful in identifying significant SNPs as well as removing spurious false positives. Our results indicate that, using the PDT, functional disease-associated SNPs can be successfully identified with a dense map of moderately polymorphic SNPs.
- Published
- 2002
31. Estimating the total number of alleles using a sample coverage method
- Author
-
Bruce S. Weir and Shu-Pang Huang
- Subjects
Genetics ,Linkage disequilibrium ,education.field_of_study ,Models, Statistical ,Models, Genetic ,Population ,Biology ,Genotype frequency ,Genetics, Population ,Genetic distance ,Sample size determination ,Mutation (genetic algorithm) ,Mutation ,Allele ,education ,Allele frequency ,Alleles ,Microsatellite Repeats ,Research Article - Abstract
Previously reported methods for estimating the number of different alleles at a single locus in a population have not described a useful general result. Using the number of alleles observed in a sample gives an underestimate for the true number of alleles. The similar problem of estimating the number of species in a population was first investigated in 1943. In this article we use the sample coverage method proposed by Chao and Lee in 1992 to estimate the number of alleles in a population when there are unequal allele frequencies. Simulation studies under the recurrent mutation model show that, for reasonable sample sizes, a significantly better estimate of the true number can be obtained than that using only the observed alleles. Results under the stepwise mutation model and infinite-allele model are presented. Possible applications include improving the characterization of the prior distribution for the allele frequencies, adjusting the estimates of genetic diversity, and estimating the range of microsatellite alleles.
- Published
- 2001
32. Sex-specific longevity associations defined by Tyrosine Hydroxylase-Insulin-Insulin Growth Factor 2 haplotypes on the 11p15.5 chromosomal region
- Author
-
Giuseppina Rose, G. De Benedictis, Valentina Greco, Massimiliano Bonafè, M. De Luca, Sabrina Garasto, Claudio Franceschi, and Bruce S. Weir
- Subjects
Adult ,Genetic Markers ,Male ,Aging ,Linkage disequilibrium ,Tyrosine 3-Monooxygenase ,Genetic Linkage ,medicine.medical_treatment ,media_common.quotation_subject ,Longevity ,Biology ,Biochemistry ,Endocrinology ,Genetic linkage ,Insulin-Like Growth Factor II ,Genetics ,medicine ,Humans ,Insulin ,Molecular Biology ,media_common ,Aged ,Sex Characteristics ,Tyrosine hydroxylase ,Chromosomes, Human, Pair 11 ,Haplotype ,Chromosome Mapping ,Cell Biology ,Middle Aged ,Haplotypes ,Genetic marker ,Chromosomal region ,Female ,Polymorphism, Restriction Fragment Length - Abstract
By studies in centenarians, it was recently found that an STR marker of the Tyrosine Hydroxylase (TH, 11p15.5) gene is associated with human longevity. The aim of the present study was to continue the exploration of the 11p15.5 chromosomal region in human longevity by analyzing two additional RFLP markers, which lie in the Insulin (INS) and Insulin Growth Factor 2 (IGF2) genes. Both the genes, which are localized downstream TH, are indeed good candidates in longevity, as ascertained on the basis of laboratory studies in experimental models. Neither INS nor IGF2 markers did reveal association with longevity. Nevertheless, linkage disequilibrium analyses showed sex-specific longevity associations defined by both TH-INS and TH-IGF2 haplotypes. On the whole, the results reinforce the involvement of the chromosomal region spanning from TH to IGF2 loci in controlling the longevity phenotype in humans.
- Published
- 2001
33. A Statistician Looks for Human Disease Genes
- Author
-
Bruce S. Weir
- Subjects
Genetics ,Linkage disequilibrium ,medicine.medical_specialty ,Human disease ,Molecular genetics ,Realm ,medicine ,Disease ,Biology ,Gene ,Statistician - Abstract
Knowing the location of human disease genes is a first step towards a characterization of the gene, and an eventual understanding of the nature of the defect and the development of therapies or even cures. This process is necessarily within the realm of molecular genetics, but there does seem to be a role for a statistician in at least the early stages of a search. This role is generally centered on an examination of the joint behavior of the disease and a series of marker loci, either within families or within populations.
- Published
- 2000
- Full Text
- View/download PDF
34. A Bayesian characterization of Hardy-Weinberg disequilibrium
- Author
-
Ian Painter, Jennifer Shoemaker, and Bruce S. Weir
- Subjects
Genotype ,Bayesian probability ,Disequilibrium ,Population ,Biology ,Conjugate prior ,Linkage Disequilibrium ,Bayes' theorem ,Prior probability ,Statistics ,Genetics ,medicine ,Animals ,Humans ,education ,Alleles ,Probability ,education.field_of_study ,Models, Genetic ,Bayes Theorem ,Weighting ,Genetics, Population ,medicine.symptom ,Bayesian linear regression ,Research Article - Abstract
A Bayesian method for determining if there are large departures from independence between pairs of alleles at a locus, Hardy-Weinberg equilibrium (HWE), is presented. We endorse the view that a population will never be exactly in HWE and that there will be occasions when there is a need for an alternative to the usual hypothesis-testing setting. Bayesian methods provide such an alternative, and our approach differs from previous Bayesian treatments in using the disequilibrium and inbreeding coefficient parameterizations. These are easily interpretable but may be less mathematically tractable than other parameterizations. We examined the posterior distributions of our parameters for evidence that departures from HWE were large. For either parameterization, when a conjugate prior was used, the prior probability for small departures was itself small, i.e., the prior was weighted against small departures from independence. We could avoid this uneven weighting by using a step prior which gave equal weighting to both small and large departures from HWE. In most cases, the Bayesian methodology makes it clear that there are not enough data to draw a conclusion.
- Published
- 1998
35. Marker selection for the transmission/disequilibrium test, in recently admixed populations
- Author
-
E. R. Martin, R. W. Morris, Bruce S. Weir, and Norman L. Kaplan
- Subjects
Linkage disequilibrium ,Population ,Locus (genetics) ,Admixture ,Ancestry-informative marker ,Biology ,Linkage Disequilibrium ,Association ,Gene Frequency ,Genetics ,Humans ,Genetics(clinical) ,Allele ,education ,Allele frequency ,Genetics (clinical) ,Alleles ,education.field_of_study ,Transmission/disequilibrium test ,Models, Genetic ,Transmission disequilibrium test ,Complex trait ,Genetics, Population ,Microsatellite ,Genome scan ,Research Article - Abstract
SummaryRecent admixture between genetically differentiated populations can result in high levels of association between alleles at loci that are ≤10 cM apart. The transmission/disequilibrium test (TDT) proposed by Spielman et al. (1993) can be a powerful test of linkage between disease and marker loci in the presence of association and therefore could be a useful test of linkage in admixed populations. The degree of association between alleles at two loci depends on the differences in allele frequencies, at the two loci, in the founding populations; therefore, the choice of marker is important. For a multiallelic marker, one strategy that may improve the power of the TDT is to group marker alleles within a locus, on the basis of information about the founding populations and the admixed population, thereby collapsing the marker into one with fewer alleles. We have examined the consequences of collapsing a microsatellite into a two-allele marker, when two founding populations are assumed for the admixed population, and have found that if there is random mating in the admixed population, then typically there is a collapsing for which the power of the TDT is greater than that for the original microsatellite marker. A method is presented for finding the optimal collapsing that has minimal dependence on the disease and that uses estimates either of marker allele frequencies in the two founding populations or of marker allele frequencies in the current, admixed population and in one of the founding populations. Furthermore, this optimal collapsing is not always the collapsing with the largest difference in allele frequencies in the founding populations. To demonstrate this strategy, we considered a recent data set, published previously, that provides frequency estimates for 30 microsatellites in 13 populations.
- Published
- 1998
36. Exact tests for association between alleles at arbitrary numbers of loci
- Author
-
Dmitri V. Zaykin, Lev A. Zhivotovsky, and Bruce S. Weir
- Subjects
Genetics ,Linkage disequilibrium ,Heterozygote ,Genotype ,Conditional probability ,Plant Science ,General Medicine ,Biology ,Hardy–Weinberg principle ,Set (abstract data type) ,Exact test ,Genetics, Population ,Gene Frequency ,Insect Science ,Statistical significance ,Data Interpretation, Statistical ,Statistics ,Humans ,Animal Science and Zoology ,Multinomial distribution ,Allele ,Algorithms ,Alleles ,Probability - Abstract
Associations between allelic frequencies, within and between loci, can be tested for with an exact test. The probability of the set of multi-locus genotypes in a sample, conditional on the allelic counts, is calculated from multinomial theory under the hypothesis of no association. Alleles are then permuted and the conditional probability calculated for the permuted genotypic array. The proportion of arrays no more probable than the original sample provides the significance level for the test. An algorithm is provided for counting genotypes efficiently in the arrays, and the powers of the test presented for various kinds of association. The powers for the case when associations are generated by admixture of several populations suggest that exact tests are capable of detecting levels of association that would affect forensic calculations to a significant extent.
- Published
- 1995
37. Independence of VNTR alleles defined as fixed bins
- Author
-
Bruce S. Weir
- Subjects
Linkage disequilibrium ,Databases, Factual ,Population ,Disequilibrium ,Population genetics ,Biology ,Investigations ,California ,Gene Frequency ,Genetics ,medicine ,Ethnicity ,Humans ,education ,Allele frequency ,Alleles ,Repetitive Sequences, Nucleic Acid ,education.field_of_study ,GYPA ,Models, Genetic ,DNA ,Texas ,Variable number tandem repeat ,Genetics, Population ,Data Interpretation, Statistical ,Florida ,Restriction fragment length polymorphism ,medicine.symptom ,Polymorphism, Restriction Fragment Length - Abstract
An analysis is presented of data collected by the Federal Bureau of Investigation at six unlinked variable number of tandem repeats (VNTR) loci for the United States population. Databases have been constructed of VNTR profiles of Caucasians, Blacks and Hispanics from Florida, Texas and California. There was very little evidence for correlations between lengths for pairs of VNTR fragments, within or between loci. When the fragment lengths were amalgamated into discrete bins, there was also little evidence for disequilibrium over all genotypes, within or between loci, for the Caucasian database, although some disequilibrium was found for the Black and Hispanic databases. No disequilibrium was found for the Caucasian or Black databases when tests were confined to heterozygous individuals. In cases of global disequilibrium, local tests can be applied to specific genotypes. The results suggest that, at the bin level, frequencies of VNTR profiles can generally be estimated as the products of the frequencies of the constituent elements. This overcomes the problem of estimating population frequencies when any particular profile does not exist in the database. There is some evidence for different frequencies, at the individual bin level, between geographic samples within each of the Caucasian, Black and Hispanic databases, and considerable evidence for differences between the three databases. These differences are less evident for the frequencies of four-locus profiles.
- Published
- 1992
38. Prenatal diagnosis and linkage disequilibrium with cystic fibrosis for markers surrounding D7S8
- Author
-
Maurizio Ferrari, Silvia Russo, Dicky J. J. Halley, Michael Dean, Francis S. Collins, Giovanni Romeo, Kenneth Ward, Bruce S. Weir, Michael C. Iannuzzi, Jean A. Amos, Paula Finn, Ben A. Oostra, Jennifer R. Lynch, and Marcella Devoto
- Subjects
Genetic Markers ,0106 biological sciences ,Linkage disequilibrium ,Cystic Fibrosis ,Prenatal diagnosis ,Locus (genetics) ,Biology ,01 natural sciences ,Cystic fibrosis ,Linkage Disequilibrium ,03 medical and health sciences ,Genetic linkage ,Prenatal Diagnosis ,Genetics ,medicine ,Humans ,Genetics (clinical) ,030304 developmental biology ,0303 health sciences ,medicine.disease ,Human genetics ,Pedigree ,3. Good health ,Genetic marker ,Restriction fragment length polymorphism ,DNA Probes ,Polymorphism, Restriction Fragment Length ,010606 plant biology & botany - Abstract
Three polymorphic DNA markers surrounding the D7S8 locus were tested for their usefulness in the diagnosis of cystic fibrosis (CF) by linkage analysis. The markers correspond to the loci D7S424 and D7S426. These polymorphisms were studied by centers in the U.S., the United Kingdom, the Netherlands, and Italy, using samples from populations throughout Europe and North America. The additional information provided by these probes increased the heterogeneity of the region from 50% to 58% and was essential for a completely informative diagnosis in one family. A very high degree of linkage disequilibrium was found between these markers, which span a distance of approximately 250 kb. In addition, linkage disequilibrium with CF was noted. Significant heterogeneity of linkage disequilibrium was found among the populations, both for the marker-marker pairs and between the markers and CF.
- Published
- 1990
- Full Text
- View/download PDF
39. The variance of sample heterozygosity
- Author
-
Bruce S. Weir, John V. Reynolds, and Ken G. Dodds
- Subjects
Linkage (software) ,Linkage disequilibrium ,Heterozygote ,Models, Genetic ,Population genetics ,Selfing ,Chromosome Mapping ,Genetic Variation ,Gene Pool ,Biology ,Mating system ,Loss of heterozygosity ,Genetics, Population ,Gene Frequency ,Statistics ,Mutation (genetic algorithm) ,Humans ,Mating ,Selection, Genetic ,Ecology, Evolution, Behavior and Systematics ,Alleles - Abstract
The variance of sample heterozygosity, averaged over several loci, is studied in a variety of situations. The variance depends on the sampling implicit in the mating system as well as on that explicit in the loci scored and individuals sampled. There are also effects of allelic distributions over loci and of linkage or linkage disequilibrium between pairs of loci. Results are obtained for populations in drift and mutation balance, for infinite populations undergoing mixed self and random mating, and for finite monoecious populations with or without selfing. For unlinked loci in drift/mutation balance, variances appear to be lessened more by increasing the number of loci scored than by increasing the number of individuals sampled. For infinite populations under the mixed self and random mating system, however, the reverse is true. Methods for estimating the variance of sample heterozygosity are discussed, with attention being paid to unbalanced data where not all loci are scored in all individuals.
- Published
- 1990
40. Estimation of linkage disequilibrium in randomly mating populations
- Author
-
C. Clark Cockerham and Bruce S. Weir
- Subjects
Estimation ,Genetics ,Linkage disequilibrium ,Iterative method ,Maximum likelihood ,technology, industry, and agriculture ,Biology ,complex mixtures ,Quantitative Biology::Genomics ,biological sciences ,Statistics ,Quantitative Biology::Populations and Evolution ,lipids (amino acids, peptides, and proteins) ,natural sciences ,Mating ,Cubic function ,Genetics (clinical) - Abstract
The maximum likelihood method for estimating linkage disequilibrium from genotypic data for randomly mating populations is studied. Instead of iterative methods for finding a root of the cubic equation for one of the gametic frequencies (Hill, 1974), it is recommended that the cubic be solved completely. For data with some missing genotypic classes, it is further recommended that explicit solutions for the cubic be used.
- Published
- 1979
- Full Text
- View/download PDF
41. Digenic descent measures for finite populations
- Author
-
Bruce S. Weir and C. Clark Cockerham
- Subjects
Linkage disequilibrium ,education.field_of_study ,Population ,Pedigree chart ,General Medicine ,Mating system ,Complete linkage ,Statistics ,Genetics ,Mating ,education ,Inbreeding ,Mathematics ,Descent (mathematics) - Abstract
SUMMARYThe development of a set of two-locus descent measures is reviewed. The three digenic measures, inbreeding coefficient and parental and recombinant descent coefficients, are considered in detail. The derivations of these three in pedigrees, fixed mating systems, and random mating in monoecious or dioecious populations are given. General expressions for digenic frequencies and disequilibria functions at any time are found by applying the three digenic descent measures to two types of initial populations. The final or equilibrium status of the population is also given. As the inbreeding coefficient is the same as the recombinant descent coefficient in the case of complete linkage, avoidance or promotion of early inbreeding has similar effects on the two coefficients. Estimable components of linkage disequilibrium and other measures of association within and among populations are elaborated.
- Published
- 1977
- Full Text
- View/download PDF
42. The effects of linkage and linkage disequilibrium on the covariances of noninbred relatives
- Author
-
Bruce S. Weir, John V. Reynolds, and C. Clark Cockerham
- Subjects
Linkage (software) ,Linkage disequilibrium ,Dominance (ethology) ,Assortative mating ,Statistics ,Genetics ,Epistasis ,Quantitative trait locus ,Biology ,Association mapping ,Genetics (clinical) ,Genetic association - Abstract
The effects of linkage and linkage disequilibrium on the genetic variances and covariances of noninbred relatives are formulated for quantitative traits with additive and dominance effects but without epistasis. Assortative mating is excluded. Linkage disequilibrium between two loci introduces a covariance between their additive effects and between their dominance effects. The usual coefficients of additive and dominance variances found by counting paths through common ancestors suffice to express the covariances of relatives, which now include the additive and dominance covariances. The linkage parameter, or recombination fraction, comes into play only when relating the additive or dominance covariances from one generation to another.
- Published
- 1980
- Full Text
- View/download PDF
43. TEMPORAL AND MICROGEOGRAPHIC VARIATION IN ALLOZYME FREQUENCIES IN A NATURAL POPULATION OF DROSOPHILA BUZZATII
- Author
-
Jsf Barker, Bruce S. Weir, and P. D. East
- Subjects
Heterozygote ,Linkage disequilibrium ,Genotype ,Climate ,Population ,Locus (genetics) ,Investigations ,Biology ,Gene flow ,Gene Frequency ,Genetics ,Animals ,education ,Allele frequency ,Alleles ,Demography ,education.field_of_study ,Geography ,Australia ,Genetic Variation ,Genotype frequency ,Isoenzymes ,Genes ,Natural population growth ,Drosophila ,Inbreeding - Abstract
Temporal variation in allozyme frequencies at six loci was studied by making monthly collections over 4 yr in one population of the cactophilic species Drosophila buzzatii. Ten sites were defined within the study locality, and for all temporal samples, separate collections were made at each of these sites. Population structure over microgeographic space and changes in population structure over time were analyzed using F-statistic estimators, and multivariate analyses of allele and genotype frequencies with environmental variables were carried out.—Allele frequencies showed significant variation over time, although there were no clear cyclical or seasonal patterns. A biplot analysis of allele frequencies over seasons within years and over years showed clear discrimination among years by alleles at four loci. During the 4 yr, three alleles showed directional changes which were associated with directional changes in environmental variables. Significant associations with one or more environmental variables were found for allele frequencies at every locus and for both expected and observed heterozygosities (except those for Est-1 and Est-2). Thus, variation in allele frequencies over time cannot be attributed solely to drift. Significant linkage disequilibria were detected among three loci (Est-2, Hex and Aldox), but there was no evidence for spatial or temporal patterns.—The F-statistic analyses showed significant differentiation among months within years for all loci, but the statistic used (coancestry) was heterogeneous among loci. Estimates of F (inbreeding) for all loci were significantly different from zero, with the loci in four groups, Adh-1 (negative), Pgm (small positive), Est-2 and Hex (intermediate) and Est-1 and Aldox (high positive). The correlation of genes within individuals within populations (f) for each locus in each month by site sample differed among loci, as did the patterns of change in f over time (seasons). Heterogeneity in the F-statistic estimates indicates that natural selection is directly or indirectly affecting allele and genotype frequencies at some loci. However, the F-statistic analyses showed essentially no microgeographic structure (i.e., among sites), although there was significant heterogeneity in allele frequencies among flies emerging from individual rots.—Thus, microspatial heterogeneity probably is most important at the level of individual rots, and coupled with habitat selection, it could be a major factor promoting diversifying selection and the maintenance of polymorphism. Resolution of the nature of this selection and of the apparent inbreeding detected at all loci except Adh-1 will require detailed study of the breeding structure of the population at the microhabitat level (individual rots) and of gene flow within the population.
- Published
- 1986
- Full Text
- View/download PDF
44. Detecting Marker-Disease Association by Testing for Hardy-Weinberg Disequilibrium at a Marker Locus
- Author
-
Margaret G. Ehm, Bruce S. Weir, and Dahlia M. Nielsen
- Subjects
Genetics ,Linkage disequilibrium ,education.field_of_study ,Fine mapping ,Disequilibrium ,Population ,Hardy-Weinberg disequilibrium ,Case-control test ,Locus (genetics) ,Biology ,Complex trait ,Marker-disease association ,Genetic marker ,medicine ,Genetics(clinical) ,medicine.symptom ,Allele ,Association mapping ,education ,Genetics (clinical) ,Genetic association ,Research Article - Abstract
SummaryWe review and extend a recent suggestion that fine-scale localization of a disease-susceptibility locus for a complex disease be done on the basis of deviations from Hardy-Weinberg equilibrium among affected individuals. This deviation is driven by linkage disequilibrium between disease and marker loci in the whole population and requires a heterogeneous genetic basis for the disease. A finding of marker-locus Hardy-Weinberg disequilibrium therefore implies disease heterogeneity and marker-disease linkage disequilibrium. Although a lack of departure of Hardy-Weinberg disequilibrium at marker loci implies that disease susceptibility–weighted linkage disequilibria are zero, given disease heterogeneity, it does not follow that the usual measures of linkage disequilibrium are zero. For disease-susceptibility loci with more than two alleles, therefore, care is needed in the drawing of inferences from marker Hardy-Weinberg disequilibria.
- Full Text
- View/download PDF
45. 6. Complete Characterization Of Disequilibrium At Two Loci
- Author
-
Bruce S. Weir and C. Clark Cockerham
- Subjects
Linkage disequilibrium ,Evolutionary biology ,Disequilibrium ,medicine ,Biology ,medicine.symptom ,Association mapping - Published
- 1989
- Full Text
- View/download PDF
46. Effect of Mating Structure on Variation in Linkage Disequilibrium
- Author
-
W. G. Hill and Bruce S. Weir
- Subjects
Genetics ,education.field_of_study ,Linkage disequilibrium ,Models, Genetic ,Genetic Linkage ,Population ,Disequilibrium ,Population genetics ,Sampling (statistics) ,Biology ,Investigations ,Mating system ,Genetics, Population ,Effective population size ,Gene Frequency ,Evolutionary biology ,medicine ,medicine.symptom ,education ,Crosses, Genetic ,Mathematics ,Genetic association - Abstract
Measurement of linkage disequilibrium involves two sampling processes. First, there is the sampling of gametes in the population to form successive generations, and this generates disequilibrium dependent on the effective population size (Ne) and the mating structure. Second, there is sampling of a finite number (n) of individuals to estimate the population disequilibrium. ——Two-locus descent measures are used to describe the mating system and are transformed to disequilibrium moments at the final sampling. Approximate eigenvectors for the transition matrix of descent measures are used to obtain formulae for the variance of the observed disequilibria as a function of Ne, mating structure, n, and linkage or recombination parameter.—The variance of disequilibrium is the same for monoecious populations with or without random selfing and for dioecious populations with random pairing for each progeny. With monogamy, the variance is slightly higher, the proportional difference being greater for unlinked loci.
- Published
- 1980
47. Variances and covariances of squared linkage disequilibria in finite populations
- Author
-
William G. Hill and Bruce S. Weir
- Subjects
Linkage (software) ,Recombination, Genetic ,Linkage disequilibrium ,Base Sequence ,Models, Genetic ,Genetic Linkage ,Disequilibrium ,Biology ,Restriction site ,Genetics, Population ,Gene Frequency ,Mutation (genetic algorithm) ,Statistics ,Mutation ,medicine ,Gene conversion ,medicine.symptom ,Ecology, Evolution, Behavior and Systematics - Abstract
Analysis of linkage disequilibrium D among restriction sites or bases in DNA sequences, arising from mutations in finite populations, depends on a knowledge of the variance-covariance structure of measures such as D2 between different pairs of sites. This requires evaluation of the eighth moments of gene frequencies among two, three, and four loci, and the necessary methodology is derived here and results are computed. While primary emphasis is placed on disequilibrium arising from mutation or gene conversion, the methodology also allows for the joint effects of only drift and recombination. Numerical results confirm that squared linkage disequilibria can have high variances and covariances.
- Published
- 1988
48. Behavior of pairs of loci in finite monoecious populations
- Author
-
Bruce S. Weir and C. Clark Cockerham
- Subjects
Genetics ,Linkage disequilibrium ,Genetics, Population ,Plant reproductive morphology ,Mating ,Biology ,Models, Biological ,Ecology, Evolution, Behavior and Systematics ,Mathematics - Abstract
Transition equations are established for descent measures for pairs of loci in finite randomly mating monoecious populations. The two special cases of equal chance gamete formation and two gametes per parent are considered in detail. The descent measures allow genotypic frequencies to be found but are used mainly to evaluate three quadrigenic moments, including the variance of the linkage disequilibrium. Numerical properties of these moments are compared to previously reported values.
- Published
- 1974
49. Testing Hypotheses about Linkage Disequilibrium with Multiple Alleles
- Author
-
Bruce S. Weir and C. Clark Cockerham
- Subjects
Genetics ,Linkage disequilibrium ,Locus (genetics) ,Multiple alleles ,sense organs ,Allele ,Biology ,Investigations ,skin and connective tissue diseases - Abstract
For loci with multiple alleles, hypotheses about linkage disequilibrium may be tested on the complete set of gametic data, or on various collapsed sets of data. Collapsing data into a few alleles at each locus can change the power of the tests, as implied in a recent paper by Zouros, Golding and Mackay (1977). We show that the nature of such changes can be found from properties of the noncentral chi-square distribution, and that the magnitude and direction of these changes depend on the levels of linkage disequilibria, allelic frequencies and degrees of freedom.
- Published
- 1978
50. Allozymic Variation and Linkage Disequilibrium in Some Laboratory Populations of DROSOPHILA MELANOGASTER
- Author
-
C. C. Laurie-Ahlberg and Bruce S. Weir
- Subjects
Genetics ,Linkage (software) ,Loss of heterozygosity ,Linkage disequilibrium ,biology ,Genotype ,Overdominance ,Allele ,Drosophila melanogaster ,Investigations ,biology.organism_classification ,Genetic association - Abstract
Nine laboratory populations of D. melanogaster were surveyed by starch gel electrophoresis for variation at 17 enzyme loci. A single-fly extract could be assayed for all 17 enzymes, so that the data consist of 17-locus genotypes.— Pairwise linkage disequilibria were estimated from the multilocus genotypic frequencies, using both BURROWS' and HILL'S methods. Large amounts of link-age disequilibrium were found, in contrast to the results reported for natural populations.—Knowledge of the approximate sizes of these populations was used to compare the observed heterozygosities and linkage disequilibria with predictions of the neutral allele hypothesis. The relatively large amount of linkage disequilibrium is consistent with the small sizes of the populations. However, the levels of heterozygosity in at least some populations suggest that some mechanism has been operating to retard the rate of decay by random drift. Several examples of significant deviation from Hardy-Weinberg frequencies and the large amount of linkage disequilibnim present in these populations indicate that a likely mechanism is selective effects associated with neutral alleles because of linkage disequilibrium with selected loci (e.g., "associative overdominance"). The results are therefore consistent with both neutralist, and selectionist hypotheses, but suggest the importance of considering linkage disequilibrium between neutral and selected loci when attempting to explain the dynamics of enzyme polymorphisms.
- Published
- 1979
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.