43 results on '"Hubisz MJ"'
Search Results
2. A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes
- Author
-
Capra, JA, Hubisz, MJ, Kostka, D, Pollard, KS, Siepel, A, Capra, JA, Hubisz, MJ, Kostka, D, Pollard, KS, and Siepel, A
- Abstract
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al.
- Published
- 2013
3. Lineage-specific intolerance to oncogenic drivers restricts histological transformation.
- Author
-
Gardner EE, Earlie EM, Li K, Thomas J, Hubisz MJ, Stein BD, Zhang C, Cantley LC, Laughney AM, and Varmus H
- Subjects
- Humans, Epithelial Cells pathology, Lung pathology, Oncogenes, Cell Lineage, Molecular Targeted Therapy, Adenocarcinoma of Lung genetics, Adenocarcinoma of Lung pathology, Adenocarcinoma of Lung therapy, Lung Neoplasms genetics, Lung Neoplasms pathology, Lung Neoplasms therapy, Small Cell Lung Carcinoma genetics, Small Cell Lung Carcinoma pathology, Small Cell Lung Carcinoma therapy, Proto-Oncogene Proteins c-myc genetics, Proto-Oncogene Proteins c-akt genetics
- Abstract
Lung adenocarcinoma (LUAD) and small cell lung cancer (SCLC) are thought to originate from different epithelial cell types in the lung. Intriguingly, LUAD can histologically transform into SCLC after treatment with targeted therapies. In this study, we designed models to follow the conversion of LUAD to SCLC and found that the barrier to histological transformation converges on tolerance to Myc, which we implicate as a lineage-specific driver of the pulmonary neuroendocrine cell. Histological transformations are frequently accompanied by activation of the Akt pathway. Manipulating this pathway permitted tolerance to Myc as an oncogenic driver, producing rare, stem-like cells that transcriptionally resemble the pulmonary basal lineage. These findings suggest that histological transformation may require the plasticity inherent to the basal stem cell, enabling tolerance to previously incompatible oncogenic driver programs.
- Published
- 2024
- Full Text
- View/download PDF
4. Non-cell-autonomous cancer progression from chromosomal instability.
- Author
-
Li J, Hubisz MJ, Earlie EM, Duran MA, Hong C, Varela AA, Lettera E, Deyell M, Tavora B, Havel JJ, Phyu SM, Amin AD, Budre K, Kamiya E, Cavallo JA, Garris C, Powell S, Reis-Filho JS, Wen H, Bettigole S, Khan AJ, Izar B, Parkes EE, Laughney AM, and Bakhoum SF
- Subjects
- Humans, Benchmarking, Cell Communication, Colorectal Neoplasms drug therapy, Colorectal Neoplasms genetics, Colorectal Neoplasms immunology, Colorectal Neoplasms pathology, Melanoma drug therapy, Melanoma genetics, Melanoma immunology, Melanoma pathology, Tumor Microenvironment, Interferon Type I immunology, Neoplasm Metastasis, Endoplasmic Reticulum Stress, Signal Transduction, Triple Negative Breast Neoplasms drug therapy, Triple Negative Breast Neoplasms genetics, Triple Negative Breast Neoplasms immunology, Triple Negative Breast Neoplasms pathology, Chromosomal Instability, Disease Progression, Neoplasms genetics, Neoplasms immunology, Neoplasms pathology
- Abstract
Chromosomal instability (CIN) is a driver of cancer metastasis
1-4 , yet the extent to which this effect depends on the immune system remains unknown. Using ContactTracing-a newly developed, validated and benchmarked tool to infer the nature and conditional dependence of cell-cell interactions from single-cell transcriptomic data-we show that CIN-induced chronic activation of the cGAS-STING pathway promotes downstream signal re-wiring in cancer cells, leading to a pro-metastatic tumour microenvironment. This re-wiring is manifested by type I interferon tachyphylaxis selectively downstream of STING and a corresponding increase in cancer cell-derived endoplasmic reticulum (ER) stress response. Reversal of CIN, depletion of cancer cell STING or inhibition of ER stress response signalling abrogates CIN-dependent effects on the tumour microenvironment and suppresses metastasis in immune competent, but not severely immune compromised, settings. Treatment with STING inhibitors reduces CIN-driven metastasis in melanoma, breast and colorectal cancers in a manner dependent on tumour cell-intrinsic STING. Finally, we show that CIN and pervasive cGAS activation in micronuclei are associated with ER stress signalling, immune suppression and metastasis in human triple-negative breast cancer, highlighting a viable strategy to identify and therapeutically intervene in tumours spurred by CIN-induced inflammation., (© 2023. The Author(s).)- Published
- 2023
- Full Text
- View/download PDF
5. The evolution of the human DNA replication timing program.
- Author
-
Bracci AN, Dallmann A, Ding Q, Hubisz MJ, Caballero M, and Koren A
- Subjects
- Animals, Humans, Macaca mulatta genetics, Phylogeny, Eukaryota, Pan troglodytes genetics, DNA Replication Timing genetics
- Abstract
DNA is replicated according to a defined spatiotemporal program that is linked to both gene regulation and genome stability. The evolutionary forces that have shaped replication timing programs in eukaryotic species are largely unknown. Here, we studied the molecular causes and consequences of replication timing evolution across 94 humans, 95 chimpanzees, and 23 rhesus macaques. Replication timing differences recapitulated the species' phylogenetic tree, suggesting continuous evolution of the DNA replication timing program in primates. Hundreds of genomic regions had significant replication timing variation between humans and chimpanzees, of which 66 showed advances in replication origin firing in humans, while 57 were delayed. Genes overlapping these regions displayed correlated changes in expression levels and chromatin structure. Many human-chimpanzee variants also exhibited interindividual replication timing variation, pointing to ongoing evolution of replication timing at these loci. Association of replication timing variation with genetic variation revealed that DNA sequence evolution can explain replication timing variation between species. Taken together, DNA replication timing shows substantial and ongoing evolution in the human lineage that is driven by sequence alterations and could impact regulatory evolution at specific genomic sites.
- Published
- 2023
- Full Text
- View/download PDF
6. Genomic islands of differentiation in a rapid avian radiation have been driven by recent selective sweeps.
- Author
-
Hejase HA, Salman-Minkov A, Campagna L, Hubisz MJ, Lovette IJ, Gronau I, and Siepel A
- Subjects
- Animals, Biodiversity, Genetic Variation, Machine Learning, Genomic Islands, Models, Genetic
- Abstract
Numerous studies of emerging species have identified genomic "islands" of elevated differentiation against a background of relative homogeneity. The causes of these islands remain unclear, however, with some signs pointing toward "speciation genes" that locally restrict gene flow and others suggesting selective sweeps that have occurred within nascent species after speciation. Here, we examine this question through the lens of genome sequence data for five species of southern capuchino seedeaters, finch-like birds from South America that have undergone a species radiation during the last ∼50,000 generations. By applying newly developed statistical methods for ancestral recombination graph inference and machine-learning methods for the prediction of selective sweeps, we show that previously identified islands of differentiation in these birds appear to be generally associated with relatively recent, species-specific selective sweeps, most of which are predicted to be soft sweeps acting on standing genetic variation. Many of these sweeps coincide with genes associated with melanin-based variation in plumage, suggesting a prominent role for sexual selection. At the same time, a few loci also exhibit indications of possible selection against gene flow. These observations shed light on the complex manner in which natural selection shapes genome sequences during speciation., Competing Interests: The authors declare no competing interest.
- Published
- 2020
- Full Text
- View/download PDF
7. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph.
- Author
-
Hubisz MJ, Williams AL, and Siepel A
- Subjects
- Animals, Evolution, Molecular, Human Migration, Humans, Gene Flow, Models, Genetic, Neanderthals genetics, Population genetics, Recombination, Genetic
- Abstract
The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these "super-archaic" regions-comprising at least about 4Mb-were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today., Competing Interests: No authors have competing interests.
- Published
- 2020
- Full Text
- View/download PDF
8. Parallel evolution of ancient, pleiotropic enhancers underlies butterfly wing pattern mimicry.
- Author
-
Lewis JJ, Geltman RC, Pollak PC, Rondem KE, Van Belleghem SM, Hubisz MJ, Munn PR, Zhang L, Benson C, Mazo-Vargas A, Danko CG, Counterman BA, Papa R, and Reed RD
- Subjects
- Adaptation, Physiological genetics, Animals, CRISPR-Cas Systems, Chimera, Evolution, Molecular, Genome, Insect, Genome-Wide Association Study, Insect Proteins genetics, Phylogeny, Pigmentation genetics, Promoter Regions, Genetic, Regulatory Sequences, Nucleic Acid, Butterflies physiology, Enhancer Elements, Genetic, Genetic Pleiotropy, Pigmentation physiology, Wings, Animal physiology
- Abstract
Color pattern mimicry in Heliconius butterflies is a classic case study of complex trait adaptation via selection on a few large effect genes. Association studies have linked color pattern variation to a handful of noncoding regions, yet the presumptive cis-regulatory elements (CREs) that control color patterning remain unknown. Here we combine chromatin assays, DNA sequence associations, and genome editing to functionally characterize 5 cis-regulatory elements of the color pattern gene optix We were surprised to find that the cis-regulatory architecture of optix is characterized by pleiotropy and regulatory fragility, where deletion of individual cis-regulatory elements has broad effects on both color pattern and wing vein development. Remarkably, we found orthologous cis-regulatory elements associate with wing pattern convergence of distantly related comimics, suggesting that parallel coevolution of ancestral elements facilitated pattern mimicry. Our results support a model of color pattern evolution in Heliconius where changes to ancient, multifunctional cis-regulatory elements underlie adaptive radiation., Competing Interests: The authors declare no competing interest.
- Published
- 2019
- Full Text
- View/download PDF
9. SweepFinder2: increased sensitivity, robustness and flexibility.
- Author
-
DeGiorgio M, Huber CD, Hubisz MJ, Hellmann I, and Nielsen R
- Subjects
- Evolution, Molecular, Humans, Likelihood Functions, Mutation Rate, Selection, Genetic, Software
- Abstract
Unlabelled: SweepFinder is a widely used program that implements a powerful likelihood-based method for detecting recent positive selection, or selective sweeps. Here, we present SweepFinder2, an extension of SweepFinder with increased sensitivity and robustness to the confounding effects of mutation rate variation and background selection. Moreover, SweepFinder2 has increased flexibility that enables the user to specify test sites, set the distance between test sites and utilize a recombination map., Availability and Implementation: SweepFinder2 is a freely-available (www.personal.psu.edu/mxd60/sf2.html) software package that is written in C and can be run from a Unix command line., Contact: mxd60@psu.edu., (© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2016
- Full Text
- View/download PDF
10. Ancient gene flow from early modern humans into Eastern Neanderthals.
- Author
-
Kuhlwilm M, Gronau I, Hubisz MJ, de Filippo C, Prado-Martinez J, Kircher M, Fu Q, Burbano HA, Lalueza-Fox C, de la Rasilla M, Rosas A, Rudan P, Brajkovic D, Kucan Ž, Gušic I, Marques-Bonet T, Andrés AM, Viola B, Pääbo S, Meyer M, Siepel A, and Castellano S
- Subjects
- Altitude, Animals, Bayes Theorem, Chromosomes, Human, Pair 21 genetics, Croatia ethnology, Genome, Human genetics, Genomics, Haplotypes genetics, Heterozygote, Humans, Hybridization, Genetic genetics, Phylogeny, Population Density, Siberia, Spain ethnology, Time Factors, Gene Flow genetics, Neanderthals genetics
- Abstract
It has been shown that Neanderthals contributed genetically to modern humans outside Africa 47,000-65,000 years ago. Here we analyse the genomes of a Neanderthal and a Denisovan from the Altai Mountains in Siberia together with the sequences of chromosome 21 of two Neanderthals from Spain and Croatia. We find that a population that diverged early from other modern humans in Africa contributed genetically to the ancestors of Neanderthals from the Altai Mountains roughly 100,000 years ago. By contrast, we do not detect such a genetic contribution in the Denisovan or the two European Neanderthals. We conclude that in addition to later interbreeding events, the ancestors of Neanderthals from the Altai Mountains and early modern humans met and interbred, possibly in the Near East, many thousands of years earlier than previously thought.
- Published
- 2016
- Full Text
- View/download PDF
11. A method for calculating probabilities of fitness consequences for point mutations across the human genome.
- Author
-
Gulko B, Hubisz MJ, Gronau I, and Siepel A
- Subjects
- Animals, Cell Line, Evolution, Molecular, Human Umbilical Vein Endothelial Cells, Humans, Pan troglodytes genetics, Polymorphism, Genetic, Probability, Regulatory Sequences, Nucleic Acid, Genetic Fitness, Genome, Human, Point Mutation
- Abstract
We describe a new computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These 'fitness consequence' (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct 'fingerprints' on the basis of high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types on the basis of public data from ENCODE. In comparison with conventional conservation scores, fitCons scores show considerably improved prediction power for cis regulatory elements. In addition, fitCons scores indicate that 4.2-7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
- Published
- 2015
- Full Text
- View/download PDF
12. Exploring the genesis and functions of Human Accelerated Regions sheds light on their role in human evolution.
- Author
-
Hubisz MJ and Pollard KS
- Subjects
- Animals, Chromosome Mapping, DNA classification, Humans, Models, Genetic, Phylogeny, Conserved Sequence genetics, DNA genetics, Evolution, Molecular, Genome, Human genetics, Hominidae genetics
- Abstract
Human accelerated regions (HARs) are DNA sequences that changed very little throughout mammalian evolution, but then experienced a burst of changes in humans since divergence from chimpanzees. This unexpected evolutionary signature is suggestive of deeply conserved function that was lost or changed on the human lineage. Since their discovery, the actual roles of HARs in human evolution have remained somewhat elusive, due to their being almost exclusively non-coding sequences with no annotation. Ongoing research is beginning to crack this problem by leveraging new genome sequences, functional genomics data, computational approaches, and genetic assays to reveal that many HARs are developmental gene regulatory elements and RNA genes, most of which evolved their uniquely human mutations through positive selection before divergence of archaic hominins and diversification of modern humans., (Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2014
- Full Text
- View/download PDF
13. Genome-wide inference of ancestral recombination graphs.
- Author
-
Rasmussen MD, Hubisz MJ, Gronau I, and Siepel A
- Subjects
- Algorithms, Computer Simulation, Humans, Markov Chains, Models, Genetic, Monte Carlo Method, Evolution, Molecular, Genome, Human, Recombination, Genetic, Selection, Genetic genetics
- Abstract
The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.
- Published
- 2014
- Full Text
- View/download PDF
14. Genome-wide inference of natural selection on human transcription factor binding sites.
- Author
-
Arbiza L, Gronau I, Aksoy BA, Hubisz MJ, Gulko B, Keinan A, and Siepel A
- Subjects
- Animals, Base Sequence, Binding Sites genetics, Chromosome Mapping, Computer Simulation, Genome-Wide Association Study, Humans, Models, Genetic, Models, Statistical, Mutation physiology, Regulatory Sequences, Nucleic Acid genetics, Substrate Specificity, Genome, Human genetics, Selection, Genetic genetics, Transcription Factors metabolism
- Abstract
For decades, it has been hypothesized that gene regulation has had a central role in human evolution, yet much remains unknown about the genome-wide impact of regulatory mutations. Here we use whole-genome sequences and genome-wide chromatin immunoprecipitation and sequencing data to demonstrate that natural selection has profoundly influenced human transcription factor binding sites since the divergence of humans from chimpanzees 4-6 million years ago. Our analysis uses a new probabilistic method, called INSIGHT, for measuring the influence of selection on collections of short, interspersed noncoding elements. We find that, on average, transcription factor binding sites have experienced somewhat weaker selection than protein-coding genes. However, the binding sites of several transcription factors show clear evidence of adaptation. Several measures of selection are strongly correlated with predicted binding affinity. Overall, regulatory elements seem to contribute substantially to both adaptive substitutions and deleterious polymorphisms with key implications for human evolution and disease.
- Published
- 2013
- Full Text
- View/download PDF
15. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes.
- Author
-
Capra JA, Hubisz MJ, Kostka D, Pollard KS, and Siepel A
- Subjects
- Animals, Base Sequence, Chromosome Mapping, Genome, Humans, Mammals, Models, Theoretical, Recombination, Genetic, Sequence Alignment, Evolution, Molecular, Gene Conversion genetics, Pan troglodytes genetics, Phylogeny, Selection, Genetic
- Abstract
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2013
- Full Text
- View/download PDF
16. Replacing and additive horizontal gene transfer in Streptococcus.
- Author
-
Choi SC, Rasmussen MD, Hubisz MJ, Gronau I, Stanhope MJ, and Siepel A
- Subjects
- Gene Duplication genetics, Genes, Bacterial genetics, Genes, Essential genetics, Humans, Models, Genetic, Selection, Genetic, Gene Transfer, Horizontal genetics, Phylogeny, Streptococcus genetics
- Abstract
The prominent role of Horizontal Gene Transfer (HGT) in the evolution of bacteria is now well documented, but few studies have differentiated between evolutionary events that predominantly cause genes in one lineage to be replaced by homologs from another lineage ("replacing HGT") and events that result in the addition of substantial new genomic material ("additive HGT"). Here in, we make use of the distinct phylogenetic signatures of replacing and additive HGTs in a genome-wide study of the important human pathogen Streptococcus pyogenes (SPY) and its close relatives S. dysgalactiae subspecies equisimilis (SDE) and S. dysgalactiae subspecies dysgalactiae (SDD). Using recently developed statistical models and computational methods, we find evidence for abundant gene flow of both kinds within each of the SPY and SDE clades and of reduced levels of exchange between SPY and SDD. In addition, our analysis strongly supports a pronounced asymmetry in SPY-SDE gene flow, favoring the SPY-to-SDE direction. This finding is of particular interest in light of the recent increase in virulence of pathogenic SDE. We find much stronger evidence for SPY-SDE gene flow among replacing than among additive transfers, suggesting a primary influence from homologous recombination between co-occurring SPY and SDE cells in human hosts. Putative virulence genes are correlated with transfer events, but this correlation is found to be driven by additive, not replacing, HGTs. The genes affected by additive HGTs are enriched for functions having to do with transposition, recombination, and DNA integration, consistent with previous findings, whereas replacing HGTs seen to influence a more diverse set of genes. Additive transfers are also found to be associated with evidence of positive selection. These findings shed new light on the manner in which HGT has shaped pathogenic bacterial genomes.
- Published
- 2012
- Full Text
- View/download PDF
17. The role of GC-biased gene conversion in shaping the fastest evolving regions of the human genome.
- Author
-
Kostka D, Hubisz MJ, Siepel A, and Pollard KS
- Subjects
- Base Composition genetics, Base Sequence, Humans, Likelihood Functions, Sequence Alignment, Evolution, Molecular, Gene Conversion genetics, Genome, Human genetics, Models, Genetic, Selection, Genetic
- Abstract
GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC-in contrast to adaptive processes-may have driven the human changes in these sequences. To investigate this hypothesis, we developed a substitution model for DNA sequence evolution that quantifies the nonlinear interacting effects of selection and gBGC on substitution rates and patterns. Based on this model, we used a series of lineage-specific likelihood ratio tests to evaluate sequence alignments for evidence of changes in mode of selection, action of gBGC, or both. With a false positive rate of less than 5% for individual tests, we found that the majority (76%) of previously identified human accelerated regions are best explained without gBGC, whereas a substantial minority (19%) are best explained by the action of gBGC alone. Further, more than half (55%) have substitution rates that significantly exceed local estimates of the neutral rate, suggesting that these regions may have been shaped by positive selection rather than by relaxation of constraint. By distinguishing the effects of gBGC, relaxation of constraint, and positive selection we provide an integrated analysis of the evolutionary forces that shaped the fastest evolving regions of the human genome, which facilitates the design of targeted functional studies of adaptation in humans.
- Published
- 2012
- Full Text
- View/download PDF
18. A high-resolution map of human evolutionary constraint using 29 mammals.
- Author
-
Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, Ward LD, Lowe CB, Holloway AK, Clamp M, Gnerre S, Alföldi J, Beal K, Chang J, Clawson H, Cuff J, Di Palma F, Fitzgerald S, Flicek P, Guttman M, Hubisz MJ, Jaffe DB, Jungreis I, Kent WJ, Kostka D, Lara M, Martins AL, Massingham T, Moltke I, Raney BJ, Rasmussen MD, Robinson J, Stark A, Vilella AJ, Wen J, Xie X, Zody MC, Baldwin J, Bloom T, Chin CW, Heiman D, Nicol R, Nusbaum C, Young S, Wilkinson J, Worley KC, Kovar CL, Muzny DM, Gibbs RA, Cree A, Dihn HH, Fowler G, Jhangiani S, Joshi V, Lee S, Lewis LR, Nazareth LV, Okwuonu G, Santibanez J, Warren WC, Mardis ER, Weinstock GM, Wilson RK, Delehaunty K, Dooling D, Fronik C, Fulton L, Fulton B, Graves T, Minx P, Sodergren E, Birney E, Margulies EH, Herrero J, Green ED, Haussler D, Siepel A, Goldman N, Pollard KS, Pedersen JS, Lander ES, and Kellis M
- Subjects
- Animals, Disease, Exons genetics, Genomics, Health, Humans, Molecular Sequence Annotation, Phylogeny, RNA classification, RNA genetics, Selection, Genetic genetics, Sequence Alignment, Sequence Analysis, DNA, Evolution, Molecular, Genome genetics, Genome, Human genetics, Mammals genetics
- Abstract
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.
- Published
- 2011
- Full Text
- View/download PDF
19. Bayesian inference of ancient human demography from individual genome sequences.
- Author
-
Gronau I, Hubisz MJ, Gulko B, Danko CG, and Siepel A
- Subjects
- Bayes Theorem, Chromosome Mapping, Evolution, Molecular, Gene Flow, Genetic Drift, Genetic Variation, Humans, Models, Genetic, Population Dynamics, Sequence Alignment, Validation Studies as Topic, Genetics, Population, Genome, Human, Population Density
- Abstract
Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108-157 thousand years ago, that Eurasians diverged from an ancestral African population 38-64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ∼9,000.
- Published
- 2011
- Full Text
- View/download PDF
20. Error and error mitigation in low-coverage genome assemblies.
- Author
-
Hubisz MJ, Lin MF, Kellis M, and Siepel A
- Subjects
- Animals, Chromosome Mapping methods, Genome genetics, Genomics methods, Humans, Mammals genetics, Molecular Sequence Annotation methods, Sequence Analysis, DNA methods, Databases, Nucleic Acid standards, Molecular Sequence Annotation standards, Research Design, Sequence Analysis, DNA standards
- Abstract
The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1-4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.
- Published
- 2011
- Full Text
- View/download PDF
21. PHAST and RPHAST: phylogenetic analysis with space/time models.
- Author
-
Hubisz MJ, Pollard KS, and Siepel A
- Subjects
- Databases, Genetic, Genome, Information Storage and Retrieval methods, Internet, Genomics methods, Phylogeny, Software
- Abstract
The PHylogenetic Analysis with Space/Time models (PHAST) software package consists of a collection of command-line programs and supporting libraries for comparative genomics. PHAST is best known as the engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser. However, it also includes several other tools for phylogenetic modeling and functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations. PHAST has been in development since 2002 and has now been downloaded more than 1000 times, but so far it has been released only as provisional ('beta') software. Here, we describe the first official release (v1.0) of PHAST, with improved stability, portability and documentation and several new features. We outline the components of the package and detail recent improvements. In addition, we introduce a new interface to the PHAST libraries from the R statistical computing environment, called RPHAST, and illustrate its use in a series of vignettes. We demonstrate that RPHAST can be particularly useful in applications involving both large-scale phylogenomics and complex statistical analyses. The R interface also makes the PHAST libraries acccessible to non-C programmers, and is useful for rapid prototyping. PHAST v1.0 and RPHAST v1.0 are available for download at http://compgen.bscb.cornell.edu/phast, under the terms of an unrestrictive BSD-style license. RPHAST can also be obtained from the Comprehensive R Archive Network (CRAN; http://cran.r-project.org).
- Published
- 2011
- Full Text
- View/download PDF
22. Comparative genomic analysis of the Streptococcus dysgalactiae species group: gene content, molecular adaptation, and promoter evolution.
- Author
-
Suzuki H, Lefébure T, Hubisz MJ, Pavinski Bitar P, Lang P, Siepel A, and Stanhope MJ
- Subjects
- Animals, Bacterial Proteins metabolism, Cattle, Child, Genome, Bacterial, Humans, Molecular Sequence Data, Phylogeny, Streptococcus chemistry, Streptococcus classification, Streptococcus isolation & purification, Bacterial Proteins genetics, Cattle Diseases microbiology, Evolution, Molecular, Genomics, Promoter Regions, Genetic, Streptococcal Infections microbiology, Streptococcal Infections veterinary, Streptococcus genetics
- Abstract
Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen. Here, we present complete genome sequences for both taxa, with analyses involving other species of Streptococcus but focusing on adaptation in the SD species group. We found little evidence for enrichment in biochemical categories of genes carried by each SD strain, however, differences in the virulence gene repertoire were apparent. Some of the differences could be ascribed to prophage and integrative conjugative elements. We identified approximately 9% of the nonrecombinant core genome to be under positive selection, some of which involved known virulence factors in other bacteria. Analyses of proteomes by pooling data across genes, by biochemical category, clade, or branch, provided evidence for increased rates of evolution in several gene categories, as well as external branches of the tree. Promoters were primarily evolving under purifying selection but with certain categories of genes evolving faster. Many of these fast-evolving categories were the same as those associated with rapid evolution in proteins. Overall, these results suggest that adaptation to changing environments and new hosts in the SD species group has involved the acquisition of key virulence genes along with selection of orthologous protein-coding loci and operon promoters.
- Published
- 2011
- Full Text
- View/download PDF
23. A simple genetic architecture underlies morphological variation in dogs.
- Author
-
Boyko AR, Quignon P, Li L, Schoenebeck JJ, Degenhardt JD, Lohmueller KE, Zhao K, Brisbin A, Parker HG, vonHoldt BM, Cargill M, Auton A, Reynolds A, Elkahloun AG, Castelhano M, Mosher DS, Sutter NB, Johnson GS, Novembre J, Hubisz MJ, Siepel A, Wayne RK, Bustamante CD, and Ostrander EA
- Subjects
- Animals, Body Size, Genome, Genome-Wide Association Study, Phenotype, Polymorphism, Single Nucleotide, Quantitative Trait Loci, Animals, Domestic anatomy & histology, Animals, Domestic genetics, Dogs anatomy & histology, Genetic Variation
- Abstract
Domestic dogs exhibit tremendous phenotypic diversity, including a greater variation in body size than any other terrestrial mammal. Here, we generate a high density map of canine genetic variation by genotyping 915 dogs from 80 domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across 60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource with external measurements from breed standards and individuals as well as skeletal measurements from museum specimens, we identify 51 regions of the dog genome associated with phenotypic variation among breeds in 57 traits. The complex traits include average breed body size and external body dimensions and cranial, dental, and long bone shape and size with and without allometric scaling. In contrast to the results from association mapping of quantitative traits in humans and domesticated plants, we find that across dog breeds, a small number of quantitative trait loci (< or = 3) explain the majority of phenotypic variation for most of the traits we studied. In addition, many genomic regions show signatures of recent selection, with most of the highly differentiated regions being associated with breed-defining traits such as body size, coat characteristics, and ear floppiness. Our results demonstrate the efficacy of mapping multiple traits in the domestic dog using a database of genotyped individuals and highlight the important role human-directed selection has played in altering the genetic architecture of key traits in this important species., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2010
- Full Text
- View/download PDF
24. Detection of nonneutral substitution rates on mammalian phylogenies.
- Author
-
Pollard KS, Hubisz MJ, Rosenbloom KR, and Siepel A
- Subjects
- Animals, Computer Simulation, Conserved Sequence, Humans, Likelihood Functions, Mammals classification, Models, Genetic, Models, Statistical, Primates genetics, Sequence Alignment, Software, Species Specificity, Base Sequence, Evolution, Molecular, Mammals genetics, Phylogeny, Selection, Genetic
- Abstract
Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.
- Published
- 2010
- Full Text
- View/download PDF
25. Targets of balancing selection in the human genome.
- Author
-
Andrés AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, Gutenkunst RN, White TJ, Green ED, Bustamante CD, Clark AG, and Nielsen R
- Subjects
- Alleles, Chromosome Segregation genetics, Demography, Haplotypes genetics, Humans, Quantitative Trait, Heritable, Sequence Analysis, DNA, Genome, Human genetics, Selection, Genetic
- Abstract
Balancing selection is potentially an important biological force for maintaining advantageous genetic diversity in populations, including variation that is responsible for long-term adaptation to the environment. By serving as a means to maintain genetic variation, it may be particularly relevant to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting that although balancing selection may not have an obvious impact on a large proportion of human genes, it is a key force affecting the evolution of a number of genes in humans.
- Published
- 2009
- Full Text
- View/download PDF
26. Inferring weak population structure with the assistance of sample group information.
- Author
-
Hubisz MJ, Falush D, Stephens M, and Pritchard JK
- Abstract
Genetic clustering algorithms require a certain amount of data to produce informative results. In the common situation that individuals are sampled at several locations, we show how sample group information can be used to achieve better results when the amount of data is limited. New models are developed for the structure program, both for the cases of admixture and no admixture. These models work by modifying the prior distribution for each individual's population assignment. The new prior distributions allow the proportion of individuals assigned to a particular cluster to vary by location. The models are tested on simulated data, and illustrated using microsatellite data from the CEPH Human Genome Diversity Panel. We demonstrate that the new models allow structure to be detected at lower levels of divergence, or with less data, than the original structure models or principal components methods, and that they are not biased towards detecting structure when it is not present. These models are implemented in a new version of structure which is freely available online at http://pritch.bsd.uchicago.edu/structure.html., (© 2009 Blackwell Publishing Ltd.)
- Published
- 2009
- Full Text
- View/download PDF
27. Darwinian and demographic forces affecting human protein coding genes.
- Author
-
Nielsen R, Hubisz MJ, Hellmann I, Torgerson D, Andrés AM, Albrechtsen A, Gutenkunst R, Adams MD, Cargill M, Boyko A, Indap A, Bustamante CD, and Clark AG
- Subjects
- Black or African American genetics, Demography, Gene Frequency, Genetic Variation, Genome, Human, Humans, MicroRNAs genetics, Polymorphism, Single Nucleotide, White People genetics, Evolution, Molecular, Genetics, Population, Proteins genetics, Selection, Genetic
- Abstract
Past demographic changes can produce distortions in patterns of genetic variation that can mimic the appearance of natural selection unless the demographic effects are explicitly removed. Here we fit a detailed model of human demography that incorporates divergence, migration, admixture, and changes in population size to directly sequenced data from 13,400 protein coding genes from 20 European-American and 19 African-American individuals. Based on this demographic model, we use several new and established statistical methods for identifying genes with extreme patterns of polymorphism likely to be caused by Darwinian selection, providing the first genome-wide analysis of allele frequency distributions in humans based on directly sequenced data. The tests are based on observations of excesses of high frequency-derived alleles, excesses of low frequency-derived alleles, and excesses of differences in allele frequencies between populations. We detect numerous new genes with strong evidence of selection, including a number of genes related to psychiatric and other diseases. We also show that microRNA controlled genes evolve under extremely high constraints and are more likely to undergo negative selection than other genes. Furthermore, we show that genes involved in muscle development have been subject to positive selection during recent human history. In accordance with previous studies, we find evidence for negative selection against mutations in genes associated with Mendelian disease and positive selection acting on genes associated with several complex diseases.
- Published
- 2009
- Full Text
- View/download PDF
28. Patterns of positive selection in six Mammalian genomes.
- Author
-
Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, and Siepel A
- Subjects
- Animals, Bayes Theorem, Databases, Genetic, Dogs, Gene Expression, Humans, Likelihood Functions, Macaca mulatta, Mammals classification, Mice, Pan troglodytes, Phylogeny, Primates, Rats, Rodentia, Sequence Alignment, Evolution, Molecular, Genome, Mammals genetics, Selection, Genetic
- Abstract
Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2008
- Full Text
- View/download PDF
29. Proportionally more deleterious genetic variation in European than in African populations.
- Author
-
Lohmueller KE, Indap AR, Schmidt S, Boyko AR, Hernandez RD, Hubisz MJ, Sninsky JJ, White TJ, Sunyaev SR, Nielsen R, Clark AG, and Bustamante CD
- Subjects
- Africa ethnology, Alleles, Computational Biology, Emigration and Immigration, Europe ethnology, Exons genetics, Heterozygote, Homozygote, Humans, Polymerase Chain Reaction, United States, Genome, Human genetics, Polymorphism, Single Nucleotide genetics
- Abstract
Quantifying the number of deleterious mutations per diploid human genome is of crucial concern to both evolutionary and medical geneticists. Here we combine genome-wide polymorphism data from PCR-based exon resequencing, comparative genomic data across mammalian species, and protein structure predictions to estimate the number of functionally consequential single-nucleotide polymorphisms (SNPs) carried by each of 15 African American (AA) and 20 European American (EA) individuals. We find that AAs show significantly higher levels of nucleotide heterozygosity than do EAs for all categories of functional SNPs considered, including synonymous, non-synonymous, predicted 'benign', predicted 'possibly damaging' and predicted 'probably damaging' SNPs. This result is wholly consistent with previous work showing higher overall levels of nucleotide variation in African populations than in Europeans. EA individuals, in contrast, have significantly more genotypes homozygous for the derived allele at synonymous and non-synonymous SNPs and for the damaging allele at 'probably damaging' SNPs than AAs do. For SNPs segregating only in one population or the other, the proportion of non-synonymous SNPs is significantly higher in the EA sample (55.4%) than in the AA sample (47.0%; P < 2.3 x 10(-37)). We observe a similar proportional excess of SNPs that are inferred to be 'probably damaging' (15.9% in EA; 12.1% in AA; P < 3.3 x 10(-11)). Using extensive simulations, we show that this excess proportion of segregating damaging alleles in Europeans is probably a consequence of a bottleneck that Europeans experienced at about the time of the migration out of Africa.
- Published
- 2008
- Full Text
- View/download PDF
30. Patterns of mutation and selection at synonymous sites in Drosophila.
- Author
-
Singh ND, Bauer DuMont VL, Hubisz MJ, Nielsen R, and Aquadro CF
- Subjects
- Animals, Base Composition genetics, Codon genetics, Genes, Insect, Genes, X-Linked, Polymorphism, Single Nucleotide genetics, Drosophila genetics, Mutation genetics, Selection, Genetic
- Abstract
That natural selection affects molecular evolution at synonymous sites in protein-coding sequences is well established and is thought to predominantly reflect selection for translational efficiency/accuracy mediated through codon bias. However, a recently developed maximum likelihood framework, when applied to 18 coding sequences in 3 species of Drosophila, confirmed an earlier report that the Notch gene in Drosophila melanogaster was evolving under selection in favor of those codons defined as unpreferred in this species. This finding opened the possibility that synonymous sites may be subject to a variety of selective pressures beyond weak selection for increased frequencies of the codons currently defined as "preferred" in D. melanogaster. To further explore patterns of synonymous site evolution in Drosophila in a lineage-specific manner, we expanded the application of the maximum likelihood framework to 8,452 protein coding sequences with well-defined orthology in D. melanogaster, Drosophila sechellia, and Drosophila yakuba. Our analyses reveal intragenomic and interspecific variation in mutational patterns as well as in patterns and intensity of selection on synonymous sites. In D. melanogaster, our results provide little statistical evidence for recent selection on synonymous sites, and Notch remains an outlier. In contrast, in D. sechellia our findings provide evidence in support of selection predominantly in favor of preferred codons. However, there is a small subset of genes in this species that appear to be evolving under selection in favor of unpreferred codons, which indicates that selection on synonymous sites is not limited to the preferential fixation of mutations that enhance the speed or accuracy of translation in this species.
- Published
- 2007
- Full Text
- View/download PDF
31. Evolution of genes and genomes on the Drosophila phylogeny.
- Author
-
Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, Bhutkar A, Blanco E, Bosak SA, Bradley RK, Brand AD, Brent MR, Brooks AN, Brown RH, Butlin RK, Caggese C, Calvi BR, Bernardo de Carvalho A, Caspi A, Castrezana S, Celniker SE, Chang JL, Chapple C, Chatterji S, Chinwalla A, Civetta A, Clifton SW, Comeron JM, Costello JC, Coyne JA, Daub J, David RG, Delcher AL, Delehaunty K, Do CB, Ebling H, Edwards K, Eickbush T, Evans JD, Filipski A, Findeiss S, Freyhult E, Fulton L, Fulton R, Garcia AC, Gardiner A, Garfield DA, Garvin BE, Gibson G, Gilbert D, Gnerre S, Godfrey J, Good R, Gotea V, Gravely B, Greenberg AJ, Griffiths-Jones S, Gross S, Guigo R, Gustafson EA, Haerty W, Hahn MW, Halligan DL, Halpern AL, Halter GM, Han MV, Heger A, Hillier L, Hinrichs AS, Holmes I, Hoskins RA, Hubisz MJ, Hultmark D, Huntley MA, Jaffe DB, Jagadeeshan S, Jeck WR, Johnson J, Jones CD, Jordan WC, Karpen GH, Kataoka E, Keightley PD, Kheradpour P, Kirkness EF, Koerich LB, Kristiansen K, Kudrna D, Kulathinal RJ, Kumar S, Kwok R, Lander E, Langley CH, Lapoint R, Lazzaro BP, Lee SJ, Levesque L, Li R, Lin CF, Lin MF, Lindblad-Toh K, Llopart A, Long M, Low L, Lozovsky E, Lu J, Luo M, Machado CA, Makalowski W, Marzo M, Matsuda M, Matzkin L, McAllister B, McBride CS, McKernan B, McKernan K, Mendez-Lago M, Minx P, Mollenhauer MU, Montooth K, Mount SM, Mu X, Myers E, Negre B, Newfeld S, Nielsen R, Noor MA, O'Grady P, Pachter L, Papaceit M, Parisi MJ, Parisi M, Parts L, Pedersen JS, Pesole G, Phillippy AM, Ponting CP, Pop M, Porcelli D, Powell JR, Prohaska S, Pruitt K, Puig M, Quesneville H, Ram KR, Rand D, Rasmussen MD, Reed LK, Reenan R, Reily A, Remington KA, Rieger TT, Ritchie MG, Robin C, Rogers YH, Rohde C, Rozas J, Rubenfield MJ, Ruiz A, Russo S, Salzberg SL, Sanchez-Gracia A, Saranga DJ, Sato H, Schaeffer SW, Schatz MC, Schlenke T, Schwartz R, Segarra C, Singh RS, Sirot L, Sirota M, Sisneros NB, Smith CD, Smith TF, Spieth J, Stage DE, Stark A, Stephan W, Strausberg RL, Strempel S, Sturgill D, Sutton G, Sutton GG, Tao W, Teichmann S, Tobari YN, Tomimura Y, Tsolas JM, Valente VL, Venter E, Venter JC, Vicario S, Vieira FG, Vilella AJ, Villasante A, Walenz B, Wang J, Wasserman M, Watts T, Wilson D, Wilson RK, Wing RA, Wolfner MF, Wong A, Wong GK, Wu CI, Wu G, Yamamoto D, Yang HP, Yang SP, Yorke JA, Yoshida K, Zdobnov E, Zhang P, Zhang Y, Zimin AV, Baldwin J, Abdouelleil A, Abdulkadir J, Abebe A, Abera B, Abreu J, Acer SC, Aftuck L, Alexander A, An P, Anderson E, Anderson S, Arachi H, Azer M, Bachantsang P, Barry A, Bayul T, Berlin A, Bessette D, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Bourzgui I, Brown A, Cahill P, Channer S, Cheshatsang Y, Chuda L, Citroen M, Collymore A, Cooke P, Costello M, D'Aco K, Daza R, De Haan G, DeGray S, DeMaso C, Dhargay N, Dooley K, Dooley E, Doricent M, Dorje P, Dorjee K, Dupes A, Elong R, Falk J, Farina A, Faro S, Ferguson D, Fisher S, Foley CD, Franke A, Friedrich D, Gadbois L, Gearin G, Gearin CR, Giannoukos G, Goode T, Graham J, Grandbois E, Grewal S, Gyaltsen K, Hafez N, Hagos B, Hall J, Henson C, Hollinger A, Honan T, Huard MD, Hughes L, Hurhula B, Husby ME, Kamat A, Kanga B, Kashin S, Khazanovich D, Kisner P, Lance K, Lara M, Lee W, Lennon N, Letendre F, LeVine R, Lipovsky A, Liu X, Liu J, Liu S, Lokyitsang T, Lokyitsang Y, Lubonja R, Lui A, MacDonald P, Magnisalis V, Maru K, Matthews C, McCusker W, McDonough S, Mehta T, Meldrim J, Meneus L, Mihai O, Mihalev A, Mihova T, Mittelman R, Mlenga V, Montmayeur A, Mulrain L, Navidi A, Naylor J, Negash T, Nguyen T, Nguyen N, Nicol R, Norbu C, Norbu N, Novod N, O'Neill B, Osman S, Markiewicz E, Oyono OL, Patti C, Phunkhang P, Pierre F, Priest M, Raghuraman S, Rege F, Reyes R, Rise C, Rogov P, Ross K, Ryan E, Settipalli S, Shea T, Sherpa N, Shi L, Shih D, Sparrow T, Spaulding J, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Strader C, Tesfaye S, Thomson T, Thoulutsang Y, Thoulutsang D, Topham K, Topping I, Tsamla T, Vassiliev H, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Young G, Yu Q, Zembek L, Zhong D, Zimmer A, Zwirko Z, Jaffe DB, Alvarez P, Brockman W, Butler J, Chin C, Gnerre S, Grabherr M, Kleber M, Mauceli E, and MacCallum I
- Subjects
- Animals, Codon genetics, DNA Transposable Elements genetics, Drosophila immunology, Drosophila metabolism, Drosophila Proteins genetics, Gene Order genetics, Genome, Mitochondrial genetics, Immunity genetics, Multigene Family genetics, RNA, Untranslated genetics, Reproduction genetics, Sequence Alignment, Sequence Analysis, DNA, Synteny genetics, Drosophila classification, Drosophila genetics, Evolution, Molecular, Genes, Insect genetics, Genome, Insect genetics, Genomics, Phylogeny
- Abstract
Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
- Published
- 2007
- Full Text
- View/download PDF
32. Localizing recent adaptive evolution in the human genome.
- Author
-
Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, and Nielsen R
- Subjects
- Humans, Polymorphism, Single Nucleotide, Adaptation, Biological genetics, Evolution, Molecular, Genome, Human
- Abstract
Identifying genomic locations that have experienced selective sweeps is an important first step toward understanding the molecular basis of adaptive evolution. Using statistical methods that account for the confounding effects of population demography, recombination rate variation, and single-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome with very strong evidence (p < 10(-5)) of a recent selective sweep and where our estimate of the position of the selective sweep falls within 100 kb of a known gene. Within these regions, genes of biological interest include genes in pigmentation pathways, components of the dystrophin protein complex, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep., Competing Interests: Competing interests. The authors have declared that no competing interests exist.
- Published
- 2007
- Full Text
- View/download PDF
33. Demographic histories and patterns of linkage disequilibrium in Chinese and Indian rhesus macaques.
- Author
-
Hernandez RD, Hubisz MJ, Wheeler DA, Smith DG, Ferguson B, Rogers J, Nazareth L, Indap A, Bourquin T, McPherson J, Muzny D, Gibbs R, Nielsen R, and Bustamante CD
- Subjects
- Animals, China, DNA, Mitochondrial, Demography, Genetics, Medical, Humans, India, Polymorphism, Single Nucleotide, Linkage Disequilibrium, Macaca mulatta genetics
- Abstract
To understand the demographic history of rhesus macaques (Macaca mulatta) and document the extent of linkage disequilibrium (LD) in the genome, we partially resequenced five Encyclopedia of DNA Elements regions in 9 Chinese and 38 captive-born Indian rhesus macaques. Population genetic analyses of the 1467 single-nucleotide polymorphisms discovered suggest that the two populations separated about 162,000 years ago, with the Chinese population tripling in size since then and the Indian population eventually shrinking by a factor of four. Using coalescent simulations, we confirmed that these inferred demographic events explain a much faster decay of LD in Chinese (r(2) approximately 0.15 at 10 kilobases) versus Indian (r(2) approximately 0.52 at 10 kilobases) macaque populations.
- Published
- 2007
- Full Text
- View/download PDF
34. Evolutionary and biomedical insights from the rhesus macaque genome.
- Author
-
Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, Dinh HH, Dugan-Rocha S, Fulton LA, Gabisi RA, Garner TT, Godfrey J, Hawes AC, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Kirkness EF, Cree A, Fowler RG, Lee S, Lewis LR, Li Z, Liu YS, Moore SM, Muzny D, Nazareth LV, Ngo DN, Okwuonu GO, Pai G, Parker D, Paul HA, Pfannkoch C, Pohl CS, Rogers YH, Ruiz SJ, Sabo A, Santibanez J, Schneider BW, Smith SM, Sodergren E, Svatek AF, Utterback TR, Vattathil S, Warren W, White CS, Chinwalla AT, Feng Y, Halpern AL, Hillier LW, Huang X, Minx P, Nelson JO, Pepin KH, Qin X, Sutton GG, Venter E, Walenz BP, Wallis JW, Worley KC, Yang SP, Jones SM, Marra MA, Rocchi M, Schein JE, Baertsch R, Clarke L, Csürös M, Glasscock J, Harris RA, Havlak P, Jackson AR, Jiang H, Liu Y, Messina DN, Shen Y, Song HX, Wylie T, Zhang L, Birney E, Han K, Konkel MK, Lee J, Smit AF, Ullmer B, Wang H, Xing J, Burhans R, Cheng Z, Karro JE, Ma J, Raney B, She X, Cox MJ, Demuth JP, Dumas LJ, Han SG, Hopkins J, Karimpour-Fard A, Kim YH, Pollack JR, Vinar T, Addo-Quaye C, Degenhardt J, Denby A, Hubisz MJ, Indap A, Kosiol C, Lahn BT, Lawson HA, Marklein A, Nielsen R, Vallender EJ, Clark AG, Ferguson B, Hernandez RD, Hirani K, Kehrer-Sawatzki H, Kolb J, Patil S, Pu LL, Ren Y, Smith DG, Wheeler DA, Schenck I, Ball EV, Chen R, Cooper DN, Giardine B, Hsu F, Kent WJ, Lesk A, Nelson DL, O'brien WE, Prüfer K, Stenson PD, Wallace JC, Ke H, Liu XM, Wang P, Xiang AP, Yang F, Barber GP, Haussler D, Karolchik D, Kern AD, Kuhn RM, Smith KE, and Zwieg AS
- Subjects
- Animals, Biomedical Research, Female, Gene Duplication, Gene Rearrangement, Genetic Diseases, Inborn, Genetic Variation, Humans, Male, Multigene Family, Mutation, Pan troglodytes genetics, Sequence Analysis, DNA, Species Specificity, Evolution, Molecular, Genome, Macaca mulatta genetics
- Abstract
The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
- Published
- 2007
- Full Text
- View/download PDF
35. Adaptive genic evolution in the Drosophila genomes.
- Author
-
Shapiro JA, Huang W, Zhang C, Hubisz MJ, Lu J, Turissini DA, Fang S, Wang HY, Hudson RR, Nielsen R, Chen Z, and Wu CI
- Subjects
- Amino Acid Substitution, Animals, Molecular Sequence Data, Polymorphism, Genetic, Recombination, Genetic, Adaptation, Biological genetics, Drosophila melanogaster genetics, Evolution, Molecular, Genome, Insect genetics
- Abstract
Determining the extent of adaptive evolution at the genomic level is central to our understanding of molecular evolution. A suitable observation for this purpose would consist of polymorphic data on a large and unbiased collection of genes from two closely related species, each having a large and stable population. In this study, we sequenced 419 genes from 24 lines of Drosophila melanogaster and its close relatives. Together with data from Drosophila simulans, these data reveal the following. (i) Approximately 10% of the loci in regions of normal recombination are much less polymorphic at silent sites than expected, hinting at the action of selective sweeps. (ii) The level of polymorphism is negatively correlated with the rate of nonsynonymous divergence across loci. Thus, even under strict neutrality, the ratio of amino acid to silent nucleotide changes (A:S) between Drosophila species is expected to be 25-40% higher than the A:S ratio for polymorphism when data are pooled across the genome. (iii) The observed A/S ratio between species among the 419 loci is 28.9% higher than the (adjusted) neutral expectation. We estimate that nearly 30% of the amino acid substitutions between D. melanogaster and its close relatives were adaptive. (iv) This signature of adaptive evolution is observable only in regions of normal recombination. Hence, the low level of polymorphism observed in regions of reduced recombination may not be driven primarily by positive selection. Finally, we discuss the theories and data pertaining to the interpretation of adaptive evolution in genomic studies.
- Published
- 2007
- Full Text
- View/download PDF
36. Maximum likelihood estimation of ancestral codon usage bias parameters in Drosophila.
- Author
-
Nielsen R, Bauer DuMont VL, Hubisz MJ, and Aquadro CF
- Subjects
- Animals, Drosophila melanogaster genetics, Genes, Insect, Genetics, Population, Likelihood Functions, Mutation, Codon genetics, Drosophila genetics
- Abstract
We present a likelihood method for estimating codon usage bias parameters along the lineages of a phylogeny. The method is an extension of the classical codon-based models used for estimating dN/dS ratios along the lineages of a phylogeny. However, we add one extra parameter for each lineage: the selection coefficient for optimal codon usage (S), allowing joint maximum likelihood estimation of S and the dN/dS ratio. We apply the method to previously published data from Drosophila melanogaster, Drosophila simulans, and Drosophila yakuba and show, in accordance with previous results, that the D. melanogaster lineage has experienced a reduction in the selection for optimal codon usage. However, the D. melanogaster lineage has also experienced a change in the biological mutation rates relative to D. simulans, in particular, a relative reduction in the mutation rate from A to G and an increase in the mutation rate from C to T. However, neither a reduction in the strength of selection nor a change in the mutational pattern can alone explain all of the data observed in the D. melanogaster lineage. For example, we also confirm previous results showing that the Notch locus has experienced positive selection for previously classified unpreferred mutations.
- Published
- 2007
- Full Text
- View/download PDF
37. Ascertainment bias in studies of human genome-wide polymorphism.
- Author
-
Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, and Nielsen R
- Subjects
- Databases, Genetic, Genetic Carrier Screening methods, Genotype, Haplotypes genetics, Humans, Genetics, Population, Genome, Human genetics, Genomics methods, Polymorphism, Genetic, Selection Bias
- Abstract
Large-scale SNP genotyping studies rely on an initial assessment of nucleotide variation to identify sites in the DNA sequence that harbor variation among individuals. This "SNP discovery" sample may be quite variable in size and composition, and it has been well established that properties of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee data to confirm human sequence differences. In addition, the ascertainment criteria shifted from using only SNPs that had been validated in population samples, to double-hit SNPs, to finally accepting SNPs that were singletons in small discovery samples. In contrast, Perlegen's primary discovery was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and F(ST), as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across these studies. However, discrepancies persist, suggesting that the heterogeneity in the SNP discovery process of the HapMap project resulted in a data set resistant to complete ascertainment correction. Ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders, but the effect will likely be small, and perhaps more importantly, it is unlikely that the bias will introduce false-positive inferences.
- Published
- 2005
- Full Text
- View/download PDF
38. Genomic scans for selective sweeps using SNP data.
- Author
-
Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, and Bustamante C
- Subjects
- Black or African American genetics, Computer Simulation, Genomics methods, Haplotypes genetics, Humans, White People genetics, Genome, Human genetics, Models, Genetic, Polymorphism, Single Nucleotide genetics, Selection, Genetic
- Abstract
Detecting selective sweeps from genomic SNP data is complicated by the intricate ascertainment schemes used to discover SNPs, and by the confounding influence of the underlying complex demographics and varying mutation and recombination rates. Current methods for detecting selective sweeps have little or no robustness to the demographic assumptions and varying recombination rates, and provide no method for correcting for ascertainment biases. Here, we present several new tests aimed at detecting selective sweeps from genomic SNP data. Using extensive simulations, we show that a new parametric test, based on composite likelihood, has a high power to detect selective sweeps and is surprisingly robust to assumptions regarding recombination rates and demography (i.e., has low Type I error). Our new test also provides estimates of the location of the selective sweep(s) and the magnitude of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence for selective sweeps is also found in many other regions, including genes known to be associated with disease risk such as DPP10 and COL4A3.
- Published
- 2005
- Full Text
- View/download PDF
39. Detecting coevolving amino acid sites using Bayesian mutational mapping.
- Author
-
Dimmic MW, Hubisz MJ, Bustamante CD, and Nielsen R
- Subjects
- Bayes Theorem, Binding Sites, Evolution, Molecular, Likelihood Functions, Markov Chains, Models, Statistical, Multigene Family, Mutation, Phosphoglycerate Kinase genetics, Amino Acids chemistry, Chromosome Mapping methods, Computational Biology methods, DNA Mutational Analysis
- Abstract
Motivation: The evolution of protein sequences is constrained by complex interactions between amino acid residues. Because harmful substitutions may be compensated for by other substitutions at neighboring sites, residues can coevolve. We describe a Bayesian phylogenetic approach to the detection of coevolving residues in protein families. This method, Bayesian mutational mapping (BMM), assigns mutations to the branches of the evolutionary tree stochastically, and then test statistics are calculated to determine whether a coevolutionary signal exists in the mapping. Posterior predictive P-values provide an estimate of significance, and specificity is maintained by integrating over uncertainty in the estimation of the tree topology, branch lengths and substitution rates. A coevolutionary Markov model for codon substitution is also described, and this model is used as the basis of several test statistics., Results: Results on simulated coevolutionary data indicate that the BMM method can successfully detect nearly all coevolving sites when the model has been correctly specified, and that non-parametric statistics such as mutual information are generally less powerful than parametric statistics. On a dataset of eukaryotic proteins from the phosphoglycerate kinase (PGK) family, interdomain site contacts yield a significantly greater coevolutionary signal than interdomain non-contacts, an indication that the method provides information about interacting sites. Failure to account for the heterogeneity in rates across sites in PGK resulted in a less discriminating test, yielding a marked increase in the number of reported positives at both contact and non-contact sites., Supplementary Information: http://www.dimmic.net/supplement/
- Published
- 2005
- Full Text
- View/download PDF
40. A scan for positively selected genes in the genomes of humans and chimpanzees.
- Author
-
Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A, Tanenbaum DM, Civello D, White TJ, J Sninsky J, Adams MD, and Cargill M
- Subjects
- Animals, Evolution, Molecular, Humans, Likelihood Functions, Polymerase Chain Reaction, Selection, Genetic, Zinc Fingers genetics, Genome, Genome, Human, Pan troglodytes genetics
- Abstract
Since the divergence of humans and chimpanzees about 5 million years ago, these species have undergone a remarkable evolution with drastic divergence in anatomy and cognitive abilities. At the molecular level, despite the small overall magnitude of DNA sequence divergence, we might expect such evolutionary changes to leave a noticeable signature throughout the genome. We here compare 13,731 annotated genes from humans to their chimpanzee orthologs to identify genes that show evidence of positive selection. Many of the genes that present a signature of positive selection tend to be involved in sensory perception or immune defenses. However, the group of genes that show the strongest evidence for positive selection also includes a surprising number of genes involved in tumor suppression and apoptosis, and of genes involved in spermatogenesis. We hypothesize that positive selection in some of these genes may be driven by genomic conflict due to apoptosis during spermatogenesis. Genes with maximal expression in the brain show little or no evidence for positive selection, while genes with maximal expression in the testis tend to be enriched with positively selected genes. Genes on the X chromosome also tend to show an elevated tendency for positive selection. We also present polymorphism data from 20 Caucasian Americans and 19 African Americans for the 50 annotated genes showing the strongest evidence for positive selection. The polymorphism analysis further supports the presence of positive selection in these genes by showing an excess of high-frequency derived nonsynonymous mutations.
- Published
- 2005
- Full Text
- View/download PDF
41. Evolutionary genomics: detecting selection needs comparative data.
- Author
-
Nielsen R and Hubisz MJ
- Subjects
- Animals, Genome, Mutagenesis genetics, Mutation, Missense genetics, Plasmodium falciparum genetics, Reproducibility of Results, Biological Evolution, Codon genetics, Genomics methods, Models, Genetic, Selection, Genetic
- Abstract
Positive selection at the molecular level is usually indicated by an increase in the ratio of non-synonymous to synonymous substitutions (dN/dS) in comparative data. However, Plotkin et al. describe a new method for detecting positive selection based on a single nucleotide sequence. We show here that this method is particularly sensitive to assumptions regarding the underlying mutational processes and does not provide a reliable way to identify positive selection.
- Published
- 2005
- Full Text
- View/download PDF
42. Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution.
- Author
-
Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, Thornton K, Hubisz MJ, Chen R, Meisel RP, Couronne O, Hua S, Smith MA, Zhang P, Liu J, Bussemaker HJ, van Batenburg MF, Howells SL, Scherer SE, Sodergren E, Matthews BB, Crosby MA, Schroeder AJ, Ortiz-Barrientos D, Rives CM, Metzker ML, Muzny DM, Scott G, Steffen D, Wheeler DA, Worley KC, Havlak P, Durbin KJ, Egan A, Gill R, Hume J, Morgan MB, Miner G, Hamilton C, Huang Y, Waldron L, Verduzco D, Clerc-Blankenburg KP, Dubchak I, Noor MA, Anderson W, White KP, Clark AG, Schaeffer SW, Gelbart W, Weinstock GM, and Gibbs RA
- Subjects
- Animals, Chromosome Breakage genetics, Chromosome Inversion genetics, Chromosome Mapping methods, Conserved Sequence genetics, Drosophila melanogaster genetics, Enhancer Elements, Genetic, Gene Rearrangement genetics, Genetic Variation genetics, Molecular Sequence Data, Predictive Value of Tests, Repetitive Sequences, Nucleic Acid genetics, Chromosomes genetics, Drosophila genetics, Evolution, Molecular, Genes, Insect genetics, Genome, Sequence Analysis, DNA methods
- Abstract
We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25-55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species--but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.
- Published
- 2005
- Full Text
- View/download PDF
43. Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data.
- Author
-
Nielsen R, Hubisz MJ, and Clark AG
- Subjects
- Alleles, Data Interpretation, Statistical, Genetic Variation, Models, Genetic, Gene Frequency, Polymorphism, Single Nucleotide
- Abstract
Most of the available SNP data have eluded valid population genetic analysis because most population genetical methods do not correctly accommodate the special discovery process used to identify SNPs. Most of the available SNP data have allele frequency distributions that are biased by the ascertainment protocol. We here show how this problem can be corrected by obtaining maximum-likelihood estimates of the true allele frequency distribution. In simple cases, the ML estimate of the true allele frequency distribution can be obtained analytically, but in other cases computational methods based on numerical optimization or the EM algorithm must be used. We illustrate the new correction method by analyzing some previously published SNP data from the SNP Consortium. Appropriate treatment of SNP ascertainment is vital to our ability to make correct inferences from the data of the International HapMap Project.
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.