1. Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism.
- Author
-
Gur-Arie R, Cohen CJ, Eitan Y, Shelef L, Hallerman EM, and Kashi Y
- Subjects
- Base Composition genetics, Base Sequence, DNA, Bacterial genetics, Evolution, Molecular, Molecular Sequence Data, Open Reading Frames genetics, DNA, Bacterial chemistry, DNA, Bacterial metabolism, Escherichia coli genetics, Escherichia coli metabolism, Polymorphism, Genetic genetics, Repetitive Sequences, Nucleic Acid genetics
- Abstract
Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.
- Published
- 2000