1. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences
- Author
-
Michael J. McDonald, Hsien Da Huang, Jun-Yi Leu, and Wei-Chi Wang
- Subjects
Mutation rate ,DNA-Directed DNA Polymerase ,Haploidy ,Molecular Biology/Bioinformatics ,INDEL Mutation ,Biology (General) ,Evolutionary Biology/Genomics ,Genetics ,Genome ,Microbiology/Microbial Evolution and Genomics ,General Neuroscience ,Eukaryota ,food and beverages ,Genetics and Genomics/Bioinformatics ,Genetics and Genomics/Microbial Evolution and Genomics ,Replication fork arrest ,Evolutionary Biology/Human Evolution ,Evolutionary Biology/Microbial Evolution and Genomics ,Synopsis ,Drosophila ,Genetics and Genomics/Comparative Genomics ,General Agricultural and Biological Sciences ,Research Article ,Genome evolution ,Saccharomyces cerevisiae Proteins ,Sequence analysis ,QH301-705.5 ,Sequence alignment ,Genomics ,Molecular Biology/Molecular Evolution ,Biology ,General Biochemistry, Genetics and Molecular Biology ,Evolution, Molecular ,Saccharomyces ,Genetics and Genomics/Population Genetics ,Escherichia coli ,Animals ,Humans ,Genetics and Genomics/Genomics ,Repetitive Sequences, Nucleic Acid ,Comparative genomics ,Molecular Biology/DNA Repair ,General Immunology and Microbiology ,Bacteria ,Evolutionary Biology/Evolutionary and Comparative Genetics ,Models, Genetic ,Point mutation ,Genetic Variation ,DNA Repair Enzymes ,Haplotypes ,human activities - Abstract
The authors propose that short repeat sequences may play an important role in causing the pervasive clustering of mutations across diverse genomes from prokaryotes to humans., The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution., Author Summary An intriguing observation made during the comparison of genomes is that insertion and deletion mutations (indels) cluster together with nucleotide substitutions. Two (not mutually exclusive) hypotheses have been proposed to explain this phenomenon. The first postulates that an indel mutation causes an increase in the likelihood of the surrounding sequence incurring nucleotide substitutions, while the second claims that the region of DNA in which such a cluster is located is more likely to sustain both indels and substitutions. Here, we present evidence suggesting that the region of DNA, and not the indel, is associated with the accumulation of clusters of mutations over evolutionary time scales. We find that repeat sequences are closely associated with a large proportion of indels and that the abundance of repeat sequences is linked with regions of increased nucleotide diversity. By analysing molecular data and measuring the mutation rates of genes engineered to contain repeats, we find that the mutation rate can be manipulated by the insertion of long repeat sequences. On the basis of these results, we propose a model in which repeat sequences are prone to cause stalling of the high-fidelity DNA polymerase, leading to the recruitment of error-prone repair polymerases which then replicate the surrounding sequence with a higher-than-average error rate.
- Published
- 2011