253 results on '"Igor B. Rogozin"'
Search Results
152. Ecdysozoan clade rejected by genome-wide analysis of rare amino acid replacements
- Author
-
Liran Carmel, Yuri I. Wolf, Igor B. Rogozin, and Eugene V. Koonin
- Subjects
Molecular Sequence Data ,Genome ,Evolution, Molecular ,Phylogenetics ,Genetics ,Animals ,Amino Acid Sequence ,Amino Acids ,Clade ,Molecular Biology ,Peptide sequence ,Arthropods ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,biology ,Phylogenetic tree ,Sequence Homology, Amino Acid ,Nucleic acid sequence ,Genetic Variation ,biology.organism_classification ,Cladistics ,Amino Acid Substitution ,Evolutionary biology ,Ecdysozoa ,Sequence Alignment - Abstract
As the number of sequenced genomes from diverse walks of life rapidly increases, phylogenetic analysis is entering a new era: reconstruction of the evolutionary history of organisms on the basis of full-scale comparison of their genomes. In addition to brute force, genome-wide analysis of alignments, rare genomic changes (RGCs) that are thought to comprise derived shared characters of individual clades are increasingly used in genome-wide phylogenetic studies. We propose a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions), which are inferred using a genome-scale analysis of protein and underlying nucleotide sequence alignments. The RGC_CAM approach utilizes amino acid residues conserved in major eukaryotic lineages, with the exception of a few species comprising a putative clade, and selects for phylogenetic inference only those amino acid replacements that require 2 or 3 nucleotide substitutions, in order to reduce homoplasy. The RGC_CAM analysis was combined with a procedure for rigorous statistical testing of competing phylogenetic hypotheses. The RGC_CAM method is shown to be robust to branch length differences and taxon sampling. When applied to animal phylogeny, the RGC_CAM approach strongly supports the coelomate clade that unites chordates with arthropods as opposed to the ecdysozoan (molting animals) clade. This conclusion runs against the view of animal evolution that is currently prevailing in the evo-devo community. The final solution to the coelomate-ecdysozoa controversy will require a much larger set of complete genome sequences representing diverse animal taxa. It is expected that RGC_CAM and other RGC-based methods will be crucial for these future, definitive phylogenetic studies.
- Published
- 2007
153. Known components of the immunoglobulin A:T mutational machinery are intact in Burkitt lymphoma cell lines with G:C bias
- Author
-
Chuancang Jiang, Madhumita Ray, Igor B. Rogozin, Alan B. Clark, Marilyn Diaz, and Zheng Xiao
- Subjects
DNA polymerase ,DNA repair ,Cytidine Triphosphate ,Immunology ,Somatic hypermutation ,DNA Mismatch Repair ,Article ,Mice ,Adenosine Triphosphate ,Cell Line, Tumor ,Cytidine Deaminase ,Activation-induced (cytidine) deaminase ,Animals ,Humans ,Thymine Nucleotides ,Molecular Biology ,Genetics ,biology ,Nucleotides ,Base excision repair ,Molecular biology ,Burkitt Lymphoma ,DNA Repair Enzymes ,MSH2 ,Uracil-DNA glycosylase ,Mutation ,biology.protein ,DNA mismatch repair ,Guanosine Triphosphate ,Somatic Hypermutation, Immunoglobulin - Abstract
The basis for mutations at A:T base pairs in immunoglobulin hypermutation and defining how AID interacts with the DNA of the immunoglobulin locus are major aspects of the immunoglobulin mutator mechanism where questions remain unanswered. Here, we examined the pattern of mutations generated in mice deficient in various DNA repair proteins implicated in A:T mutation and found a previously unappreciated bias at G:C base pairs in spectra from mice simultaneously deficient in DNA mismatch repair and uracil DNA glycosylase. This suggests a strand-biased DNA transaction for AID delivery which is then masked by the mechanism that introduces A:T mutations. Additionally, we asked if any of the known components of the A:T mutation machinery underscore the basis for the paucity of A:T mutations in the Burkitt lymphoma cell lines, Ramos and BL2. Ramos and BL2 cells were proficient in MSH2/MSH6-mediated mismatch repair, and express high levels of wild-type, full-length DNA polymerase eta. In addition, Ramos cells have high levels of uracil DNA glycosylase protein and are proficient in base excision repair. These results suggest that Burkitt lymphoma cell lines may be deficient in an unidentified factor that recruits the machinery necessary for A:T mutation or that AID-mediated cytosine deamination in these cells may be processed by conventional base excision repair truncating somatic hypermutation at the G:C phase. Either scenario suggests that cytosine deamination by AID is not enough to trigger A:T mutation, and that additional unidentified factors are required for full spectrum hypermutation in vivo.
- Published
- 2007
154. Roles of DNA polymerases in replication, repair, and recombination in eukaryotes
- Author
-
Youri I, Pavlov, Polina V, Shcherbakova, and Igor B, Rogozin
- Subjects
DNA Replication ,Recombination, Genetic ,DNA Repair ,Molecular Sequence Data ,Molecular Conformation ,DNA ,DNA-Directed DNA Polymerase ,Genomic Instability ,Eukaryotic Cells ,Neoplasms ,Animals ,Humans ,Genetic Predisposition to Disease ,Amino Acid Sequence - Abstract
The functioning of the eukaryotic genome depends on efficient and accurate DNA replication and repair. The process of replication is complicated by the ongoing decomposition of DNA and damage of the genome by endogenous and exogenous factors. DNA damage can alter base coding potential resulting in mutations, or block DNA replication, which can lead to double-strand breaks (DSB) and to subsequent chromosome loss. Replication is coordinated with DNA repair systems that operate in cells to remove or tolerate DNA lesions. DNA polymerases can serve as sensors in the cell cycle checkpoint pathways that delay cell division until damaged DNA is repaired and replication is completed. Eukaryotic DNA template-dependent DNA polymerases have different properties adapted to perform an amazingly wide spectrum of DNA transactions. In this review, we discuss the structure, the mechanism, and the evolutionary relationships of DNA polymerases and their possible functions in the replication of intact and damaged chromosomes, DNA damage repair, and recombination.
- Published
- 2006
155. Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns
- Author
-
Colin N. Dewey, Eugene V. Koonin, and Igor B. Rogozin
- Subjects
Spliceosome ,lcsh:QH426-470 ,lcsh:Biotechnology ,RNA Splicing ,Exonic splicing enhancer ,Regulatory Sequences, Nucleic Acid ,Biology ,Mice ,Exon ,Splicing factor ,lcsh:TP248.13-248.65 ,Genetics ,Animals ,Humans ,Conserved Sequence ,Genome ,Splice site mutation ,Intron ,Exons ,Sequence Analysis, DNA ,Group II intron ,Introns ,lcsh:Genetics ,Vertebrates ,RNA splicing ,RNA Splice Sites ,Databases, Nucleic Acid ,Sequence Alignment ,Research Article ,Biotechnology - Abstract
Background The signals that determine the specificity and efficiency of splicing are multiple and complex, and are not fully understood. Among other factors, the relative contributions of different mechanisms appear to depend on intron size inasmuch as long introns might hinder the activity of the spliceosome through interference with the proper positioning of the intron-exon junctions. Indeed, it has been shown that the information content of splice sites positively correlates with intron length in the nematode, Drosophila, and fungi. We explored the connections between the length of vertebrate introns, the strength of splice sites, exonic splicing signals, and evolution of flanking exons. Results A compensatory relationship is shown to exist between different types of signals, namely, the splice sites and the exonic splicing enhancers (ESEs). In the range of relatively short introns (approximately, < 1.5 kilobases in length), the enhancement of the splicing signals for longer introns was manifest in the increased concentration of ESEs. In contrast, for longer introns, this effect was not detectable, and instead, an increase in the strength of the donor and acceptor splice sites was observed. Conceivably, accumulation of A-rich ESE motifs beyond a certain limit is incompatible with functional constraints operating at the level of protein sequence evolution, which leads to compensation in the form of evolution of the splice sites themselves toward greater strength. In addition, however, a correlation between sequence conservation in the exon ends and intron length, particularly, in synonymous positions, was observed throughout the entire length range of introns. Thus, splicing signals other than the currently defined ESEs, i.e., potential new classes of ESEs, might exist in exon sequences, particularly, those that flank long introns. Conclusion Several weak but statistically significant correlations were observed between vertebrate intron length, splice site strength, and potential exonic splicing signals. Taken together, these findings attest to a compensatory relationship between splice sites and exonic splicing signals, depending on intron length.
- Published
- 2006
156. A glimpse of a putative pre-intron phase of eukaryotic evolution
- Author
-
Alexander V. Sverdlov, Miklós Csürös, Eugene V. Koonin, and Igor B. Rogozin
- Subjects
Genetics ,animal structures ,Models, Genetic ,fungi ,Group ii ,Intron ,Biology ,Introns ,Evolution, Molecular ,Molecular evolution ,Evolutionary biology ,Gene Duplication ,Gene duplication ,Animals ,Humans ,Gene - Abstract
Comparison of the exon–intron structures of ancient eukaryotic paralogs reveals the absence of conserved intron positions in these genes. This is in contrast to the conservation of intron positions in orthologous genes from even the most evolutionarily distant eukaryotes and in more recent paralogs. The lack of conserved intron positions in ancient paralogs probably reflects the origination of these genes during the earliest phase of eukaryotic evolution, which was characterized by concomitant invasion of genes by group II self-splicing elements (which were to become introns in the future) and extensive duplication of genes.
- Published
- 2006
157. A Genome-Wide Identification of Mitochondrial DNA Topoisomerase I in Arabidopsis
- Author
-
Igor B. Rogozin, A. I. Katyshev, and Yu. M. Konstantinov
- Subjects
Genetics ,Mitochondrial DNA ,Nuclear gene ,Arabidopsis ,DNA replication ,food and beverages ,Arabidopsis thaliana ,Mitochondrion ,Biology ,biology.organism_classification ,Gene ,Genome - Abstract
Topoisomerases are conserved enzymes that play an important role in multiple cellular processes, such as DNA recombination, DNA replication, and cell cycle checkpoint control (Pommier, 1998). It is likely that at least three different plant topoisomerases function within nucleus, in mitochondria, and in chloroplasts. This hypothesis was partially supported by biochemical experiments (Daniell et al., 1995; Balestrazzi et al., 2000). Moreover, the enzymes that are of different genetic origin and structure—prokaryotic topo IA and eukaryotic topo IB type topoisomerases—may function in chloroplasts or mitochondria. Thus, there may be more than one gene encoding chloroplast and mitochondrial topo I in plant nuclear genomes. A genome-wide analysis of the Arabidopsis thaliana nuclear genome suggested that there was only one gene encoding non-nuclear topo I. Phylogenetic analysis of prokaryotic topo I orthologs indicated that the candidate mitochondrial topo I might have been acquired from alpha-proteobacteria, which are believed to be ancestors of eukaryotic mitochondria. Interestingly, we identified two paralogous genes for mitochondrial topo I in the rice nuclear genome. Some explanations of this fact are provided, and unique features of the gene product are discussed.
- Published
- 2006
158. Dollo parsimony and the reconstruction of genome evolution
- Author
-
Eugene V. Koonin, Yuri I. Wolf, Vladimir N. Babenko, and Igor B. Rogozin
- Subjects
Genome evolution ,Evolutionary biology ,Biology - Published
- 2006
159. Signs of positive selection of somatic mutations in human cancers detected by EST sequence analysis
- Author
-
Vladimir N. Babenko, Malay Kumar Basu, Eugene V. Koonin, Fyodor A. Kondrashov, and Igor B. Rogozin
- Subjects
Cancer Research ,Somatic cell ,DNA repair ,Molecular Sequence Data ,Biology ,medicine.disease_cause ,Polymorphism, Single Nucleotide ,Somatic evolution in cancer ,lcsh:RC254-282 ,Evolution, Molecular ,Sequence Analysis, Protein ,Neoplasms ,medicine ,Genetics ,Humans ,Neoplasm ,Amino Acid Sequence ,Gene ,Expressed Sequence Tags ,Mutation ,Mutation Spectra ,medicine.disease ,lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,Amino Acid Substitution ,Oncology ,Carcinogenesis ,Sequence Alignment ,Genes, Neoplasm ,Research Article - Abstract
Background Carcinogenesis typically involves multiple somatic mutations in caretaker (DNA repair) and gatekeeper (tumor suppressors and oncogenes) genes. Analysis of mutation spectra of the tumor suppressor that is most commonly mutated in human cancers, p53, unexpectedly suggested that somatic evolution of the p53 gene during tumorigenesis is dominated by positive selection for gain of function. This conclusion is supported by accumulating experimental evidence of evolution of new functions of p53 in tumors. These findings prompted a genome-wide analysis of possible positive selection during tumor evolution. Methods A comprehensive analysis of probable somatic mutations in the sequences of Expressed Sequence Tags (ESTs) from malignant tumors and normal tissues was performed in order to access the prevalence of positive selection in cancer evolution. For each EST, the numbers of synonymous and non-synonymous substitutions were calculated. In order to identify genes with a signature of positive selection in cancers, these numbers were compared to: i) expected numbers and ii) the numbers for the respective genes in the ESTs from normal tissues. Results We identified 112 genes with a signature of positive selection in cancers, i.e., a significantly elevated ratio of non-synonymous to synonymous substitutions, in tumors as compared to 37 such genes in an approximately equal-sized EST collection from normal tissues. A substantial fraction of the tumor-specific positive-selection candidates have experimentally demonstrated or strongly predicted links to cancer. Conclusion The results of EST analysis should be interpreted with extreme caution given the noise introduced by sequencing errors and undetected polymorphisms. Furthermore, an inherent limitation of EST analysis is that multiple mutations amenable to statistical analysis can be detected only in relatively highly expressed genes. Nevertheless, the present results suggest that positive selection might affect a substantial number of genes during tumorigenic somatic evolution.
- Published
- 2006
160. Mutational hotspots in the TP53 gene and, possibly, other tumor suppressors evolve by positive selection
- Author
-
Eugene V. Koonin, Igor B. Rogozin, Galina V. Glazko, and Vladimir N. Babenko
- Subjects
Genetics ,Mutation Spectra ,Applied Mathematics ,Research ,Immunology ,Mutagenesis (molecular biology technique) ,Biology ,medicine.disease_cause ,General Biochemistry, Genetics and Molecular Biology ,Germline ,lcsh:Biology (General) ,CpG site ,Modeling and Simulation ,Mutation (genetic algorithm) ,medicine ,General Agricultural and Biological Sciences ,Carcinogenesis ,lcsh:QH301-705.5 ,Gene ,Ecology, Evolution, Behavior and Systematics ,Selection (genetic algorithm) - Abstract
Background The mutation spectra of the TP53 gene and other tumor suppressors contain multiple hotspots, i.e., sites of non-random, frequent mutation in tumors and/or the germline. The origin of the hotspots remains unclear, the general view being that they represent highly mutable nucleotide contexts which likely reflect effects of different endogenous and exogenous factors shaping the mutation process in specific tissues. The origin of hotspots is of major importance because it has been suggested that mutable contexts could be used to infer mechanisms of mutagenesis contributing to tumorigenesis. Results Here we apply three independent tests, accounting for non-uniform base compositions in synonymous and non-synonymous sites, to test whether the hotspots emerge via selection or due to mutational bias. All three tests consistently indicate that the hotspots in the TP53 gene evolve, primarily, via positive selection. The results were robust to the elimination of the highly mutable CpG dinucleotides. By contrast, only one, the least conservative test reveals the signature of positive selection in BRCA1, BRCA2, and p16. Elucidation of the origin of the hotspots in these genes requires more data on somatic mutations in tumors. Conclusion The results of this analysis seem to indicate that positive selection for gain-of-function in tumor suppressor genes is an important aspect of tumorigenesis, blurring the distinction between tumor suppressors and oncogenes. Reviewers This article was reviewed by Sandor Pongor, Christopher Lee and Mikhail Blagosklonny.
- Published
- 2006
161. Roles of DNA Polymerases in Replication, Repair, and Recombination in Eukaryotes
- Author
-
Youri I. Pavlov, Polina V. Shcherbakova, and Igor B. Rogozin
- Subjects
Genetics ,DNA re-replication ,biology ,Control of chromosome duplication ,DNA repair ,DNA polymerase II ,biology.protein ,DNA replication ,Origin recognition complex ,Eukaryotic DNA replication ,Replication protein A ,Cell biology - Abstract
The functioning of the eukaryotic genome depends on efficient and accurate DNA replication and repair. The process of replication is complicated by the ongoing decomposition of DNA and damage of the genome by endogenous and exogenous factors. DNA damage can alter base coding potential resulting in mutations, or block DNA replication, which can lead to double-strand breaks (DSB) and to subsequent chromosome loss. Replication is coordinated with DNA repair systems that operate in cells to remove or tolerate DNA lesions. DNA polymerases can serve as sensors in the cell cycle checkpoint pathways that delay cell division until damaged DNA is repaired and replication is completed. Eukaryotic DNA template-dependent DNA polymerases have different properties adapted to perform an amazingly wide spectrum of DNA transactions. In this review, we discuss the structure, the mechanism, and the evolutionary relationships of DNA polymerases and their possible functions in the replication of intact and damaged chromosomes, DNA damage repair, and recombination.
- Published
- 2006
162. Diversity and function of adaptive immune receptors in a jawless vertebrate
- Author
-
Zeev Pancer, Igor B. Rogozin, Galina V. Glazko, Lakshminarayan M. Iyer, Matthew N. Alder, and Max D. Cooper
- Subjects
Molecular Sequence Data ,Adaptation, Biological ,Evolution, Molecular ,Immune system ,Variable lymphocyte receptor ,biology.animal ,Animals ,Receptors, Immunologic ,Selection, Genetic ,Receptor ,Gene Rearrangement ,Spores, Bacterial ,Multidisciplinary ,biology ,Lamprey ,Immunity ,Vertebrate ,Genetic Variation ,Lampreys ,Gene rearrangement ,Acquired immune system ,biology.organism_classification ,Evolutionary biology ,Bacillus anthracis ,Immunology ,Function (biology) ,Bacillus subtilis - Abstract
Instead of the immunoglobulin-type antigen receptors of jawed vertebrates, jawless fish have variable lymphocyte receptors (VLRs), which consist of leucine-rich repeat (LRR) modules. Somatic diversification of the VLR gene is shown here to occur through a multistep assembly of LRR modules randomly selected from a large bank of flanking cassettes. The predicted concave surface of the VLR is lined with hypervariable positively selected residues, and computational analysis suggests a repertoire of about 10 14 unique receptors. Lamprey immunized with anthrax spores responded with the production of soluble antigen-specific VLRs. These findings reveal that two strikingly different modes of antigen recognition through rearranged lymphocyte receptors have evolved in the jawless and jawed vertebrates.
- Published
- 2005
163. Evolutionary conservation suggests a regulatory function of AUG triplets in 5'-UTRs of eukaryotic genes
- Author
-
Hesham Ali, Eugene V. Koonin, Vladimir N. Babenko, Alexander G. Churbanov, and Igor B. Rogozin
- Subjects
Molecular Sequence Data ,Codon, Initiator ,Sequence alignment ,Biology ,Article ,Conserved sequence ,Evolution, Molecular ,03 medical and health sciences ,Negative selection ,Mice ,Genetics ,Animals ,Humans ,Gene ,Conserved Sequence ,030304 developmental biology ,0303 health sciences ,Base Sequence ,030302 biochemistry & molecular biology ,Alternative splicing ,Stop codon ,Rats ,Open reading frame ,Codon, Terminator ,5' Untranslated Regions ,Sequence Alignment ,Function (biology) - Abstract
By comparing sequences of human, mouse and rat orthologous genes, we show that in 5′-untranslated regions (5′-UTRs) of mammalian cDNAs but not in 3′-UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. This effect is likely to reflect, primarily, bona fide evolutionary conservation, rather than cDNA annotation artifacts, because the excess of conserved upstream AUGs (uAUGs) is seen in 5′-UTRs containing stop codons in-frame with the start AUG and many of the conserved AUGs are found in different frames, consistent with the location in authentic non-coding sequences. Altogether, conserved uAUGs are present in at least 20–30% of mammalian genes. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 5′-UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 5′-UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs that do serve a function are conserved. Most probably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 5′-UTRs. Consistent with this hypothesis, we found that open reading frames starting from conserved uAUGs are significantly shorter than those starting from non-conserved uAUGs, possibly, owing to selection for optimization of the level of attenuation.
- Published
- 2005
164. Analysis of evolution of exon-intron structure of eukaryotic genes
- Author
-
Vladimir N. Babenko, Eugene V. Koonin, Alexander V. Sverdlov, and Igor B. Rogozin
- Subjects
Genetics ,Structural gene ,DNA Mutational Analysis ,Intron ,Chromosome Mapping ,Context (language use) ,Exons ,Sequence Analysis, DNA ,Biology ,Genome ,Biological Evolution ,Introns ,Evolution, Molecular ,Exon ,Eukaryotic Cells ,Sequence Homology, Nucleic Acid ,RNA splicing ,Consensus sequence ,Animals ,Humans ,Molecular Biology ,Gene ,Conserved Sequence ,Phylogeny ,Information Systems - Abstract
The availability of multiple, complete eukaryotic genome sequences allows one to address many fundamental evolutionary questions on genome scale. One such important, long-standing problem is evolution of exon–intron structure of eukaryotic genes. Analysis of orthologous genes from completely sequenced genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists. The data on shared and lineage-specific intron positions were used as the starting point for evolutionary reconstruction with parsimony and maximum-likelihood approaches. Parsimony methods produce reconstructions with intron-rich ancestors but also infer lineage-specific, in many cases, high levels of intron loss and gain. Different probabilistic models gave opposite results, apparently depending on model parameters and assumptions, from domination of intron loss, with extremely intron-rich ancestors, to dramatic excess of gains, to the point of denying any true conservation of intron positions among deep eukaryotic lineages. Development of models with adequate, realistic parameters and assumptions seems to be crucial for obtaining more definitive estimates of intron gain and loss in different eukaryotic lineages. Many shared intron positions were detected in ancestral eukaryotic paralogues which evolved by duplication prior to the divergence of extant eukaryotic lineages. These findings indicate that numerous introns were present in eukaryotic genes already at the earliest stages of evolution of eukaryotes and are compatible with the hypothesis that the original, catastrophic intron invasion accompanied the emergence of the eukaryotic cells. Comparison of various features of old and younger introns starts shedding light on probable mechanisms of intron insertion, indicating that propagation of old introns is unlikely to be a major mechanism for origin of new ones. The existence and structure of ancestral protosplice sites were addressed by examining the context of introns inserted within codons that encode amino acids conserved in all eukaryotes and, accordingly, are not subject to selection for splicing efficiency. It was shown that introns indeed predominantly insert into or are fixed in specific protosplice sites which have the consensus sequence (A/C)AG|Gt.
- Published
- 2005
165. DNA polymerase eta contributes to strand bias of mutations of A versus T in immunoglobulin genes
- Author
-
Igor B. Rogozin, Vladimir Mayorov, Patricia J. Gearhart, and Linda R. Adkison
- Subjects
Xeroderma pigmentosum ,Transcription, Genetic ,DNA polymerase ,Immunology ,Amino Acid Motifs ,DNA Mutational Analysis ,Molecular Sequence Data ,Gene Rearrangement, B-Lymphocyte, Heavy Chain ,Immunoglobulin Variable Region ,Somatic hypermutation ,DNA-Directed DNA Polymerase ,Substrate Specificity ,medicine ,Immunology and Allergy ,Humans ,Thymine Nucleotides ,Nucleotide ,Gene ,Base Pairing ,Polymerase ,Genetics ,chemistry.chemical_classification ,Xeroderma Pigmentosum ,biology ,Base Sequence ,Genes, Immunoglobulin ,Adenine Nucleotides ,medicine.disease ,Molecular biology ,Clone Cells ,Enzyme ,chemistry ,biology.protein ,Somatic Hypermutation, Immunoglobulin ,Sequence motif - Abstract
DNA polymerase (pol) η participates in hypermutation of A:T bases in Ig genes because humans deficient for the polymerase have fewer substitutions of these bases. To determine whether polymerase η is also responsible for the well-known preference for mutations of A vs T on the nontranscribed strand, we sequenced variable regions from three patients with xeroderma pigmentosum variant (XP-V) disease, who lack polymerase η. The frequency of mutations in the intronic region downstream of rearranged JH4 gene segments was similar between XP-V and control clones; however, there were fewer mutations of A:T bases and correspondingly more substitutions of C:G bases in the XP-V clones (p < 10−7). There was significantly less of a bias for mutations of A compared with T nucleotides in the XP-V clones compared with control clones, whereas the frequencies for mutations of C and G were identical in both groups. An analysis of mutations in the WA sequence motif suggests that polymerase η generates more mutations of A than T on the nontranscribed strand. This in vivo data from polymerase η-deficient B cells correlates well with the in vitro specificity of the enzyme. Because polymerase η inserts more mutations opposite template T than template A, it would generate more substitutions of A on the newly synthesized strand.
- Published
- 2005
166. Transcription of mammalian messenger RNAs by a nuclear RNA polymerase of mitochondrial origin
- Author
-
Eugene V. Koonin, Julia E. Kravchenko, P. M. Chumakov, and Igor B. Rogozin
- Subjects
Five-prime cap ,POLRMT ,Transcription, Genetic ,RNA-dependent RNA polymerase ,RNA polymerase II ,Biology ,Mitochondrial Proteins ,Mice ,Cell Line, Tumor ,RNA polymerase I ,Animals ,Humans ,RNA, Messenger ,Promoter Regions, Genetic ,RNA polymerase II holoenzyme ,Cell Nucleus ,Multidisciplinary ,Nuclear Proteins ,DNA-Directed RNA Polymerases ,Aldehyde Dehydrogenase ,Molecular biology ,Mitochondria ,Rats ,Alternative Splicing ,Protein Transport ,biology.protein ,Transcription factor II D ,Small nuclear RNA ,Transcription Factors - Abstract
Transcription of eukaryotic genes is performed by three nuclear RNA polymerases, of which RNA polymerase II is thought to be solely responsible for the synthesis of messenger RNAs. Here we show that transcription of some mRNAs in humans and rodents is mediated by a previously unknown single-polypeptide nuclear RNA polymerase (spRNAP-IV). spRNAP-IV is expressed from an alternative transcript of the mitochondrial RNA polymerase gene (POLRMT). The spRNAP-IV lacks 262 amino-terminal amino acids of mitochondrial RNA polymerase, including the mitochondrial-targeting signal, and localizes to the nucleus. Transcription by spRNAP-IV is resistant to the RNA polymease II inhibitor alpha-amanitin but is sensitive to short interfering RNA specific for the POLRMT gene. The promoters for spRNAP-IV differ substantially from those used by RNA polymerase II, do not respond to transcriptional enhancers and contain a common functional sequence motif.
- Published
- 2005
167. Molecular dating: ape bones agree with chicken entrails
- Author
-
Igor B. Rogozin, Galina V. Glazko, and Eugene V. Koonin
- Subjects
Likelihood Functions ,Time Factors ,Molecular dating ,Calibration (statistics) ,Range (biology) ,Fossils ,Hominidae ,Biology ,Biological Evolution ,Divergence ,Paleontology ,Calibration ,Genetics ,Animals ,Humans ,Chickens ,Phylogeny - Abstract
Molecular time estimates, especially those that employed the 310 million years ago (Mya) date of mammal-bird divergence as the calibration point, were criticized in recent publications. In this article, we estimate the divergence time of primates and rodents, primates and artiodactyls and the different great ape species by using two independent calibration-time ranges and maximally conservative error estimates. We observed a variation of approximately +/-15-20% for most of the molecular time estimates in the 10-100 Mya range. The estimated range of the primate-rodent divergence time, 84-121 Mya, includes the date obtained with the 310 million years calibration point (110 Mya). We conclude that molecular time estimates remain useful tools of evolutionary biology, although utmost caution is required when interpreting the results.
- Published
- 2005
168. Pavlov. From context-dependence of mutations to molecular mechanisms of mutagenesis
- Author
-
Igor B. Rogozin, Boris A. Malyarchuk, Luciano Milanesi, and Youri I. Pavlov.
- Subjects
Bioinformatics ,HIV ,Mutetion ,mutagenesis - Abstract
Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and repair/replication/modification enzymes, and the analysis of hotspot context provides evidence of such interactions. The hotspots might also reflect structural and functional features of the respective DNA sequences and provide information about natural selection. We discuss analysis of 8-oxoguanineinduced mutations in pro- and eukaryotic genes, polymorphic positions in the human mitochondrial DNA and mutations in the HIV-1 retrovirus. Comparative analysis of 8- oxoguanine-induced mutations and spontaneous mutation spectra suggested that a substantial fraction of spontaneous AT®CT mutations is caused by 8-oxoGTP in nucleotide pools. In the case of human mitochondrial DNA, significant differences between molecular mechanisms of mutations in hypervariable segments and coding part of DNA were detected. Analysis of mutations in the HIV-1 retrovirus suggested a complex interplay between molecular mechanisms of mutagenesis and natural selection.
- Published
- 2005
169. An Expectation-Maximization Algorithm for Analysis of Evolution of Exon-Intron Structure of Eukaryotic Genes
- Author
-
Eugene V. Koonin, Liran Carmel, Igor B. Rogozin, and Yuri I. Wolf
- Subjects
Genetics ,Simulated data ,Expectation–maximization algorithm ,Structure (category theory) ,Intron ,Eukaryotic gene ,Computational biology ,Biology ,Invariant (mathematics) ,Exon intron - Abstract
We propose a detailed model of evolution of exon-intron structure of eukaryotic genes that takes into account gene-specific intron gain and loss rates, branch-specific gain and loss coefficients, invariant sites incapable of intron gain, and rate variability of both gain and loss which is gamma-distributed across sites. We develop an expectation-maximization algorithm to estimate the parameters of this model, and study its performance using simulated data.
- Published
- 2005
170. The role of alternative translation start sites in the generation of human protein diversity
- Author
-
Nikolay A. Kolchanov, Alex V. Kochetov, Igor B. Rogozin, Akinori Sarai, and V. K. Shumny
- Subjects
Untranslated region ,Genetics ,Polymorphism, Genetic ,Intracellular Space ,food and beverages ,RNA ,Codon, Initiator ,Proteins ,Context (language use) ,Translation (biology) ,General Medicine ,Biology ,ENCODE ,Subcellular localization ,Human genetics ,Start codon ,Humans ,RNA, Messenger ,5' Untranslated Regions ,Peptide Chain Initiation, Translational ,Molecular Biology - Abstract
According to the scanning model, 40S ribosomal subunits initiate translation at the first (5' proximal) AUG codon they encounter. However, if the first AUG is in a suboptimal context, it may not be recognized, and translation can then initiate at downstream AUG(s). In this way, a single RNA can produce several variant products. Earlier experiments suggested that some of these additional protein variants might be functionally important. We have analysed human mRNAs that have AUG triplets in 5' untranslated regions and mRNAs in which the annotated translational start codon is located in a suboptimal context. It was found that 3% of human mRNAs have the potential to encode N-terminally extended variants of the annotated proteins and 12% could code for N-truncated variants. The predicted subcellular localizations of these protein variants were compared: 31% of the N-extended proteins and 30% of the N-truncated proteins were predicted to localize to subcellular compartments that differed from those targeted by the annotated protein forms. These results suggest that additional AUGs may frequently be exploited for the synthesis of proteins that possess novel functional properties.
- Published
- 2004
171. Evolution of eukaryotic gene repertoire and gene structure: discovering the unexpected dynamics of genome evolution
- Author
-
B.S. Rao, Anastasia N. Nikolskaya, Kira S. Makarova, Alexander V. Sorokin, Boris Mirkin, Aviva R. Jacobs, Alexander V. Sverdlov, John D. Jackson, N.D. Fedorova, Sona Vasudevan, Sergei Smirnov, Darren A. Natale, Eugene V. Koonin, Vladimir N. Babenko, Sergey L. Mekhedov, Igor B. Rogozin, Yuri I. Wolf, J.J. Yin, Raja Mazumder, and Dmitri M. Krylov
- Subjects
Genome evolution ,Genome ,Models, Genetic ,Repertoire ,Eukaryotic gene ,Genomics ,Biology ,Biochemistry ,Introns ,Evolution, Molecular ,Eukaryotic Cells ,Evolutionary biology ,Genetics ,Animals ,Humans ,Molecular Biology ,Gene ,Phylogeny - Published
- 2004
172. Mutagenesis by transient misalignment in the human mitochondrial DNA control region
- Author
-
Igor B. Rogozin and Boris Malyarchuk
- Subjects
Genetics ,Mitochondrial DNA ,Mutation Spectra ,Polymorphism, Genetic ,Base Sequence ,Mutagenesis ,Molecular Sequence Data ,Context (language use) ,Biology ,Molecular biology ,Human mitochondrial genetics ,DNA, Mitochondrial ,Pyrimidines ,Purines ,Coding strand ,Sequence Homology, Nucleic Acid ,Mutation (genetic algorithm) ,Humans ,Primer (molecular biology) ,Sequence Alignment ,Genetics (clinical) ,Repetitive Sequences, Nucleic Acid - Abstract
Summary To study spontaneous base substitutions in human mitochondrial DNA (mtDNA), we reconstructed the mutation spectra of the hypervariable segments I and II (HVS I and II) using published data on polymorphisms from various human populations. Classification analysis revealed numerous mutation hotspots in HVS I and II mutation spectra. Statistical analysis suggested that strand dislocation mutagenesis, operating in monotonous runs of nucleotides, plays an important role in generating base substitutions in the mtDNA control region. The frequency of mutations compatible with the primer strand dislocation in the HVS I region was almost twice as high as that for template strand dislocation. Frequencies of mutations compatible with the primer and template strand dislocation models are almost equal in the HVS II region. Further analysis of strand dislocation models suggested that an excess of pyrimidine transitions in mutation spectra, reconstructed on the basis of the L-strand sequence, is caused by an excess of both L-strand pyrimidine transitions and H-strand purine transitions. In general, no significant bias toward parent H-strand-specific dislocation mutagenesis was found in the HVS I and II regions.
- Published
- 2004
173. Cutting edge: DGYW/WRCH is a better predictor of mutability at G:C bases in Ig hypermutation than the widely accepted RGYW/WRCY motif and probably reflects a two-step activation-induced cytidine deaminase-triggered process
- Author
-
Marilyn Diaz and Igor B. Rogozin
- Subjects
Guanine ,DNA Repair ,Immunology ,Amino Acid Motifs ,DNA Mutational Analysis ,Somatic hypermutation ,Biology ,Lymphocyte Activation ,DNA sequencing ,DNA Glycosylases ,chemistry.chemical_compound ,Cytosine ,Predictive Value of Tests ,Cytidine Deaminase ,Activation-induced (cytidine) deaminase ,Immunology and Allergy ,Animals ,Humans ,A-DNA ,Gene ,Genetics ,Base Sequence ,Cytidine deaminase ,chemistry ,biology.protein ,Somatic Hypermutation, Immunoglobulin - Abstract
A feature of Ig hypermutation is the presence of hypermutable DNA sequences that are preferentially found in the V regions of Ig genes. Among these, RGYW/WRCY is the most pronounced motif (G:C is a mutable position; R = A/G, Y = C/T, and W = A/T). However, a molecular basis for the high mutability of RGYW was not known until recently. The discovery that activation-induced cytidine deaminase targets the DNA encoding V regions, has enabled the analysis of its targeting properties when expressed outside of the context of hypermutation. We analyzed these data and found evidence that activation-induced cytidine deaminase is the major source of the RGYW mutable motif, but with a new twist: DGYW/WRCH (G:C is the mutable position; D = A/G/T, H = T/C/A) is a better descriptor of the Ig mutation hotspot than RGYW/WRCY. We also found evidence that a DNA repair enzyme may play a role in modifying the sequence of hypermutation hotspots.
- Published
- 2004
174. The SPANX gene family of cancer/testis-specific antigens: rapid evolution and amplification in African great apes and hominids
- Author
-
Greg Solomon, John Otstot, Michael Mullokandov, John I. Risinger, Vladimir Larionov, Igor B. Rogozin, Natalay Kouprina, J. Carl Barrett, N. Keith Collins, and Eugene V. Koonin
- Subjects
Nonsynonymous substitution ,Male ,Subfamily ,X Chromosome ,Molecular Sequence Data ,Rodentia ,Biology ,Conserved sequence ,Evolution, Molecular ,Testicular Neoplasms ,Antigens, Neoplasm ,Pongo pygmaeus ,Testis ,Gene family ,Coding region ,Animals ,Humans ,Protein Isoforms ,Amino Acid Sequence ,Gene ,Conserved Sequence ,DNA Primers ,Genetics ,Multidisciplinary ,Gorilla gorilla ,Sequence Homology, Amino Acid ,Intron ,Gene Amplification ,Chromosome Mapping ,Hominidae ,Exons ,Biological Sciences ,Macaca mulatta ,Neoplasm Proteins ,Human genome ,Saguinus ,Sequence Alignment - Abstract
Human sperm protein associated with the nucleus on the X chromosome ( SPANX ) genes comprise a gene family with five known members ( SPANX - A1 , - A2 , - B , - C , and - D ), encoding cancer/testis-specific antigens that are potential targets for cancer immunotherapy. These highly similar paralogous genes cluster on the X chromosome at Xq27. We isolated and sequenced primate genomic clones homologous to human SPANX . Analysis of these clones and search of the human genome sequence revealed an uncharacterized group of genes, SPANX - N , which are present in all primates as well as in mouse and rat. In humans, four SPANX - N genes comprise a series of tandem duplicates at Xq27; a fifth member of this subfamily is located at Xp11. Similarly to SPANX - A / D , human SPANX - N genes are expressed in normal testis and some melanoma cell lines; testis-specific expression of SPANX is also conserved in mouse. Analysis of the taxonomic distribution of the long and short forms of the intron indicates that SPANX - N is the ancestral form, from which the SPANX - A / D subfamily evolved in the common ancestor of the hominoid lineage. Strikingly, the coding sequences of the SPANX genes evolved much faster than the intron and the 5′ untranslated region. There is a strong correlation between the rates of evolution of synonymous and nonsynonymous codon positions, both of which are accelerated 2-fold or more compared to the noncoding sequences. Thus, evolution of the SPANX family appears to have involved positive selection that affected not only the protein sequence but also the synonymous sites in the coding sequence.
- Published
- 2004
175. Context of deletions and insertions in human coding sequences
- Author
-
Alexey S. Kondrashov and Igor B. Rogozin
- Subjects
Genetic Markers ,Mutation rate ,Inverted repeat ,Biology ,Exon ,chemistry.chemical_compound ,symbols.namesake ,Databases, Genetic ,Genetics ,Humans ,Nucleotide ,Indel ,Genetics (clinical) ,Repetitive Sequences, Nucleic Acid ,chemistry.chemical_classification ,Recombination, Genetic ,Terminal Repeat Sequences ,Mutagenesis, Insertional ,chemistry ,Mendelian inheritance ,symbols ,Microsatellite ,Chromosome Deletion ,DNA ,Microsatellite Repeats - Abstract
We studied the dependence of the rate of short deletions and insertions on their contexts using the data on mutations within coding exons at 19 human loci that cause mendelian diseases. We confirm that periodic sequences consisting of three to five or more nucleotides are mutagenic. Mutability of sequences with strongly biased nucleotide composition is also elevated, even when mutations within homonucleotide runs longer than three nucleotides are ignored. In contrast, no elevated mutation rates have been detected for imperfect direct or inverted repeats. Among known candidate contexts, the indel context GTAAGT and regions with purine-pyrimidine imbalance between the two DNA strands are mutagenic in our sample, and many others are not mutagenic. Data on mutation hot spots suggest two novel contexts that increase the deletion rate. Comprehensive analysis of mutability of all possible contexts of lengths four, six, and eight indicates a substantially elevated deletion rate within YYYTG and similar sequences, which is one of the two contexts revealed by the hot spots. Possible contexts that increase the insertion rate (AT(A/C)(A/C)GCC and TACCRC) and decrease deletion (TATCGC) or insertion (GCGG) rates have also been identified. Two-thirds of deletions remove a repeat, and over 80% of insertions create a repeat, i.e., they are duplications. Hum Mutat 23:177–185, 2004. Published 2003 Wiley-Liss, Inc.
- Published
- 2004
176. Type-Specific Features of the Structure of the tRNA Gene Promoters
- Author
-
T. M. Naykova, Igor B. Rogozin, N. S. Yudin, Yu. V. Kondrakhin, Mikhail I. Voevoda, and A. G. Romashchenko
- Subjects
Multicellular organism ,Gene type ,Transfer RNA ,Type specific ,Promoter ,Computational biology ,Biology ,Evolutionary transitions ,Gene ,Sequence (medicine) - Abstract
A new method for the recognition of type 2 promoters in the eukaryotic tRNA genes, which takes into consideration not only nucleotide distribution at individual box positions, but also the multiple interactions between the different parts of the A and B boxes, is proposed. The recognition procedure is based on the module organization of the A and B boxes within the intragenic promoters of the tRNA genes. It was shown that each module of a box is represented by a limited number of the sequence variants. Their particular combinations are the characteristic features of every tRNA gene type, and they are conserved within the two kingdoms of multicellular organisms. The same module combinations were identified in the promoters of the tRNA genes of the other types in unicellular organisms. These results suggested that the sequence module variants might have recombined during the evolutionary transition from unicellular to multicellular organisms.
- Published
- 2004
177. Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals
- Author
-
Aleksey Y. Ogurtsov, David J. Lipman, Svetlana A. Shabalina, Eugene V. Koonin, and Igor B. Rogozin
- Subjects
Untranslated region ,Codon, Initiator ,Biology ,Regulatory Sequences, Ribonucleic Acid ,Polyadenylation ,Conserved sequence ,Evolution, Molecular ,Mice ,Start codon ,Yeasts ,Genetics ,Coding region ,Animals ,Humans ,RNA, Messenger ,3' Untranslated Regions ,Conserved Sequence ,Mammals ,Three prime untranslated region ,Sequence Analysis, RNA ,Shine-Dalgarno sequence ,Articles ,Stop codon ,Rats ,Open reading frame ,Eukaryotic Cells ,Codon, Terminator ,5' Untranslated Regions - Abstract
Sequencing of multiple, nearly complete eukaryotic genomes creates opportunities for detecting previously unnoticed, subtle functional signals in non-coding regions. A genome-wide comparative analysis of orthologous sets of mammalian and yeast mRNAs revealed distinct patterns of evolutionary conservation at the boundaries of the untranslated regions (UTRs) and the coding region (CDS). Elevated sequence conservation was detected in approximately 30 nt regions around the start codon. There seems to be a complementary relationship between sequence conservation in the approximately 30 nt regions of the 5'-UTR immediately upstream of the start codon and that in the synonymous positions of the 5'-terminal 30 nt of the CDS: in mammalian mRNAs, the 5'-UTR shows a greater conservation than the CDS, whereas the opposite trend holds for yeast mRNAs. Unexpectedly, a approximately 30 nt region downstream of the stop codon shows a substantially lower level of sequence conservation than the downstream portions of the 3'-UTRs. However, the sequence in this poorly conserved 30 nt portion of the 3'-UTR is non-random in that it has a higher GC content than the rest of the UTR. It is hypothesized that the elevated sequence conservation in the region immediately upstream of the start codon is related to the requirement for initiation factor binding during pre-initiation ribosomal scanning. In contrast, the poorly conserved region downstream of the stop codon could be involved in the post- termination scanning and dissociation of the ribosomes from the mRNA, which requires only the mRNA-ribosome interaction. Additionally, it was found that the choice of the stop codon in mammals, but not in yeasts, and the context in the immediate vicinity of the stop codons in both mammals and yeasts are subject to strong selection. Thus, genome-wide analysis of orthologous gene sets allows detection of previously unrecognized patterns of sequence conservation, which are likely to reflect hidden functional signals, such as ribosomal filters that could regulate translation by modulating the interaction between the mRNA and ribosomes.
- Published
- 2004
178. Theoretical analysis of mutation hotspots and their DNA sequence context specificity
- Author
-
Igor B. Rogozin and Youri I. Pavlov
- Subjects
Mutation rate ,DNA repair ,Health, Toxicology and Mutagenesis ,Mutant ,DNA Mutational Analysis ,Molecular Sequence Data ,Immunoglobulins ,Biology ,Genetics ,Consensus sequence ,Escherichia coli ,Animals ,Mutation frequency ,Pyrophosphatases ,Gene ,Mutation Spectra ,Base Sequence ,Nucleotides ,Escherichia coli Proteins ,Nucleic acid sequence ,Computational Biology ,DNA ,Models, Theoretical ,Phosphoric Monoester Hydrolases ,Mutagenesis ,Mutation ,Regression Analysis - Abstract
Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The hotspots might also reflect structural and functional features of the respective DNA sequences. When mutations in a gene are identified using a particular experimental system, resulting hotspots could reflect the properties of the gene product and the mutant selection scheme. Analysis of the nucleotide sequence context of hotspots can provide information on the molecular mechanisms of mutagenesis. However, the determinants of mutation frequency and specificity are complex, and there are many analytical methods for their study. Here we review computational approaches for analyzing mutation spectra (distribution of mutations along the target genes) that include many mutable (detectable) positions. The following methods are reviewed: derivation of a consensus sequence, application of regression approaches to correlate nucleotide sequence features with mutation frequency, mutation hotspot prediction, analysis of oligonucleotide composition of regions containing mutations, pairwise comparison of mutation spectra, analysis of multiple spectra, and analysis of "context-free" characteristics. The advantages and pitfalls of these methods are discussed and illustrated by examples from the literature. The most reliable analyses were obtained when several methods were combined and information from theoretical analysis and experimental observations was considered simultaneously. Simple, robust approaches should be used with small samples of mutations, whereas combinations of simple and complex approaches may be required for large samples. We discuss several well-documented studies where analysis of mutation spectra has substantially contributed to the current understanding of molecular mechanisms of mutagenesis. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and DNA repair, replication, and modification enzymes, and the analysis of hotspot context provides evidence of such interactions.
- Published
- 2003
179. Yeast POL5 is an evolutionarily conserved regulator of rDNA transcription unrelated to any known DNA polymerases
- Author
-
Wei Yang, Igor B. Rogozin, and Eugene V. Koonin
- Subjects
Saccharomyces cerevisiae Proteins ,Transcription, Genetic ,DNA polymerase ,Saccharomyces cerevisiae ,Molecular Sequence Data ,Regulator ,DNA-Directed DNA Polymerase ,Genome ,DNA, Ribosomal ,Evolution, Molecular ,Transcription (biology) ,Gene Expression Regulation, Fungal ,Animals ,Humans ,Amino Acid Sequence ,Molecular Biology ,Polymerase ,Conserved Sequence ,Genetics ,biology ,Cell Biology ,biology.organism_classification ,Yeast ,biology.protein ,Sequence Alignment ,Developmental Biology - Abstract
We show that yeast protein Yel055cp, which has been identified as the fifth essential DNA polymerase in Saccharomyces cerevisiae (POL5), is a member of a family of predicted rDNA transcription regulators (typified by human MYB-binding protein MYBBP1 A), which are represented by a single ortholog in all animals, fungi and plants with sequenced genomes. These proteins are confidently predicted to have an entirely a-helical structure and are unrelated to the B class DNA polymerases, as claimed for yeast POL5, or any other known polymerases.
- Published
- 2003
180. Transcriptome dynamics of Deinococcus radiodurans recovering from ionizing radiation
- Author
-
Jizhong Zhou, Dong Xu, Liyou Wu, Dorothea K. Thompson, Igor B. Rogozin, Amudhan Venkateswaran, Min Zhai, Eugene V. Koonin, Alexander S. Beliaev, Marina V. Omelchenko, Michael J. Daly, Elena K. Gaidamakova, Yongqing Liu, Kira S. Makarova, and Julia Stair
- Subjects
DNA Repair ,Transcription, Genetic ,DNA repair ,DNA damage ,Molecular Sequence Data ,Transcriptome ,Radiation, Ionizing ,Operon ,Deinococcus ,Amino Acid Sequence ,Gene ,Genetics ,chemistry.chemical_classification ,DNA ligase ,Multidisciplinary ,biology ,Sequence Homology, Amino Acid ,DNA replication ,Deinococcus radiodurans ,Gene Expression Regulation, Bacterial ,Biological Sciences ,biology.organism_classification ,chemistry ,Sequence Alignment ,Cell Division - Abstract
Deinococcus radiodurans R1 (DEIRA) is a bacterium best known for its extreme resistance to the lethal effects of ionizing radiation, but the molecular mechanisms underlying this phenotype remain poorly understood. To define the repertoire of DEIRA genes responding to acute irradiation (15 kGy), transcriptome dynamics were examined in cells representing early, middle, and late phases of recovery by using DNA microarrays covering ≈94% of its predicted genes. At least at one time point during DEIRA recovery, 832 genes (28% of the genome) were induced and 451 genes (15%) were repressed 2-fold or more. The expression patterns of the majority of the induced genes resemble the previously characterized expression profile of recA after irradiation. DEIRA recA , which is central to genomic restoration after irradiation, is substantially up-regulated on DNA damage (early phase) and down-regulated before the onset of exponential growth (late phase). Many other genes were expressed later in recovery, displaying a growth-related pattern of induction. Genes induced in the early phase of recovery included those involved in DNA replication, repair, and recombination, cell wall metabolism, cellular transport, and many encoding uncharacterized proteins. Collectively, the microarray data suggest that DEIRA cells efficiently coordinate their recovery by a complex network, within which both DNA repair and metabolic functions play critical roles. Components of this network include a predicted distinct ATP-dependent DNA ligase and metabolic pathway switching that could prevent additional genomic damage elicited by metabolism-induced free radicals.
- Published
- 2003
181. ESTMAP: A system for expressed sequence tags mapping on genomic sequences
- Author
-
Igor B. Rogozin and Luciano Milanesi
- Subjects
genomic DNA ,Gene prediction ,Molecular Sequence Data ,Biomedical Engineering ,Pharmaceutical Science ,Medicine (miscellaneous) ,Bioengineering ,Genomics ,Sequence alignment ,Computational biology ,database searches ,Biology ,Genome ,Homology (biology) ,DNA sequencing ,User-Computer Interface ,Sequence Homology, Nucleic Acid ,Databases, Genetic ,expressed sequence tag (EST) ,Electrical and Electronic Engineering ,Genetics ,Expressed Sequence Tags ,Expressed sequence tag ,Base Sequence ,Gene Expression Profiling ,Chromosome Mapping ,food and beverages ,alignment ,Sequence Analysis, DNA ,Computer Science Applications ,repetitive elements ,GenBank ,gene prediction ,Sequence Alignment ,Software ,Biotechnology - Abstract
The completion of a number of large genome sequencing projects emphasizes the importance of protein-coding gene predictions. Most of the problems associated with gene prediction are caused by the complex exon-intron structures commonly found in eukaryotic genomes. However, information from homologous sequences can significantly improve the accuracy of the prediction. In particular, expressed sequence tags (ESTs) are very useful for this purpose, since currently existing EST collections are very large. We developed an ESTMAP system, which utilizes homology searches against a database of repetitive elements using the RepeatView program and the EST Division of GenBank using the BLASTN program. ESTMAP extracts "exact" matches with EST sequences (>95% of homology) from BLASTN output file and predicts introns in DNA comparing ESTs and a query sequence. ESTMAP is implemented as a part of the WebGene system (http://www.cnr.it/webgene).
- Published
- 2003
- Full Text
- View/download PDF
182. Congruent evolution of different classes of non-coding DNA in prokaryotic genomes
- Author
-
Alexey N. Spiridonov, Kira S. Makarova, Jodie Yin, Yuri I. Wolf, Eugene V. Koonin, Igor B. Rogozin, Darren A. Natale, and Roman L. Tatusov
- Subjects
Genetics ,Comparative genomics ,Genomics ,Bacterial genome size ,Articles ,Biology ,Noncoding DNA ,Genes, Archaeal ,Evolution, Molecular ,Intergenic region ,Evolutionary biology ,Cot analysis ,Genes, Bacterial ,Genome, Archaeal ,C-value ,Operon ,Human genome ,DNA, Intergenic ,Databases, Nucleic Acid ,Genome, Bacterial - Abstract
Prokaryotic genomes are considered to be ‘wall-to-wall’ genomes, which consist largely of genes for proteins and structural RNAs, with only a small fraction of the genomic DNA allotted to intergenic regions, which are thought to typically contain regulatory signals. The majority of bacterial and archaeal genomes contain 6–14% non-coding DNA. Significant positive correlations were detected between the fraction of non-coding DNA and inter- and intra-operonic distances, suggesting that different classes of non-coding DNA evolve congruently. In contrast, no correlation was found between any of these characteristics of non-coding sequences and the number of genes or genome size. Thus, the non-coding regions and the gene sets in prokaryotes seem to evolve in different regimes. The evolution of non-coding regions appears to be determined primarily by the selective pressure to minimize the amount of non-functional DNA, while maintaining essential regulatory signals, because of which the content of non-coding DNA in different genomes is relatively uniform and intra- and inter-operonic non-coding regions evolve congruently. In contrast, the gene set is optimized for the particular environmental niche of the given microbe, which results in the lack of correlation between the gene number and the characteristics of non-coding regions.
- Published
- 2002
183. Genome trees and the tree of life
- Author
-
Nick V. Grishin, Igor B. Rogozin, Yuri I. Wolf, and Eugene V. Koonin
- Subjects
Genetics ,Genome ,Phylogenetic tree ,Gene Transfer, Horizontal ,Models, Genetic ,Tree of life (biology) ,Genetic transfer ,Computational Biology ,Genetic Variation ,Biology ,Horizontal gene transfer in evolution ,Maximum parsimony ,Evolution, Molecular ,Tree (data structure) ,Evolutionary biology ,Sequence Analysis, Protein ,Computational phylogenetics ,Tree rearrangement ,Animals ,Humans ,Sequence Alignment ,Phylogeny - Abstract
Genome comparisons indicate that horizontal gene transfer and differential gene loss are major evolutionary phenomena that, at least in prokaryotes, involve a large fraction, if not the majority, of genes. The extent of these events casts doubt on the feasibility of constructing a ‘Tree of Life', because the trees for different genes often tell different stories. However, alternative approaches to tree construction that attempt to determine tree topology on the basis of comparisons of complete gene sets seem to reveal a phylogenetic signal that supports the three-domain evolutionary scenario and suggests the possibility of delineation of previously undetected major clades of prokaryotes. If the validity of these whole-genome approaches to tree building is confirmed by analyses of numerous new genomes, which are currently being sequenced at an increasing rate, it would seem that the concept of a universal ‘species' tree is still appropriate. However, this tree should be reinterpreted as a prevailing trend in the evolution of genome-scale gene sets rather than as a complete picture of evolution.
- Published
- 2002
184. Microevolutionary genomics of bacteria
- Author
-
I. King Jordan, Yuri I. Wolf, Igor B. Rogozin, and Eugene V. Koonin
- Subjects
Genetics ,Nonsynonymous substitution ,Comparative genomics ,Genome evolution ,Bacteria ,Genomics ,Biology ,Genome ,Biological Evolution ,Gene Duplication ,Proteome ,Minimal genome ,Gene ,Ecology, Evolution, Behavior and Systematics ,Genome, Bacterial - Abstract
The availability of multiple complete genome sequences from the same species can facilitate attempts to systematically address basic questions in genome evolution. We refer to such efforts as “microevolutionary genomics”. We report the results of comparative analyses of complete intraspecific genome (and proteome) sequences from four bacterial species—Chlamydophila pneumoniae, Escherichia coli, Helicobacter pylori and Neisseria meningitidis. Comparisons of average synonymous (Ks) and nonsynonymous (Ka) substitution rates were used to assess the influence of various biological factors on the rate of protein evolution. For example, E. coli experiences the most intense purifying selection of the species analyzed, and this may be due to the relatively larger population size of this species. In addition, essential genes were shown to be more evolutionarily conserved than nonessential genes in E. coli and duplicated genes have higher rates of evolution than unique genes for all species studied except C. pneumoniae. Different functional categories of genes were shown to evolve at significantly different rates emphasizing the role of category-specific functional constraints in determining evolutionary rates. Finally, functionally characterized genes tend to be conserved between strains, while uncharacterized genes are over-represented among the unique, strain-specific genes. This suggests the possibility that nonessential genes are responsible for driving the evolutionary diversification between strains.
- Published
- 2002
185. Correlation of somatic hypermutation specificity and A-T base pair substitution errors by DNA polymerase eta during copying of a mouse immunoglobulin kappa light chain transgene
- Author
-
Thomas A. Kunkel, Youri I. Pavlov, Alexey P. Galkin, Anna Y. Aksenova, Igor B. Rogozin, Christina Rada, and Fumio Hanaoka
- Subjects
Transcription, Genetic ,DNA polymerase ,Base pair ,Molecular Sequence Data ,Somatic hypermutation ,Mice, Transgenic ,DNA-Directed DNA Polymerase ,Immunoglobulin light chain ,chemistry.chemical_compound ,Immunoglobulin kappa-Chains ,Mice ,Animals ,Transgenes ,Gene ,Base Pairing ,Genetics ,Multidisciplinary ,biology ,DNA synthesis ,Base Sequence ,Point mutation ,Adenine ,Biological Sciences ,Molecular biology ,Thymine ,chemistry ,Genetic Techniques ,biology.protein ,Somatic Hypermutation, Immunoglobulin ,Monte Carlo Method - Abstract
To test the hypothesis that inaccurate DNA synthesis by mammalian DNA polymerase η (pol η) contributes to somatic hypermutation (SHM) of Ig genes, we measured the error specificity of mouse pol η during synthesis of each strand of a mouse Ig κ light chain transgene. We then compared the results to the base substitution specificity of SHM of this same gene in the mouse. Thein vitroandin vivobase substitution spectra shared a number of common features. A highly significant correlation was observed for overall substitutions at A-T pairs but not for substitutions at G-C pairs. Sixteen mutational hotspots at A-T pairs observedin vivowere also found in spectra generated by mouse pol ηin vitro. The correlation was strongest for errors made by pol η during synthesis of the non-transcribed strand, but it was also observed for synthesis of the transcribed strand. These facts, and the distribution of substitutions generatedin vivo, support the hypothesis that pol η contributes to SHM of Ig genes at A-T pairs via short patches of low fidelity DNA synthesis of both strands, but with a preference for the non-transcribed strand.
- Published
- 2002
186. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria
- Author
-
Igor B. Rogozin, I. King Jordan, Eugene V. Koonin, and Yuri I. Wolf
- Subjects
Genetics ,Gram-negative bacteria ,Genes, Essential ,Letter ,Helicobacter pylori ,Directional selection ,Saccharomyces cerevisiae ,Biology ,Neisseria meningitidis ,biology.organism_classification ,medicine.disease_cause ,Conserved sequence ,Evolution, Molecular ,Genes, Bacterial ,Gram-Negative Bacteria ,medicine ,Escherichia coli ,Minimal genome ,Gene ,Genetics (clinical) ,Caenorhabditis elegans ,Conserved Sequence ,Genome, Bacterial - Abstract
The “knockout-rate” prediction holds that essential genes should be more evolutionarily conserved than are nonessential genes. This is because negative (purifying) selection acting on essential genes is expected to be more stringent than that for nonessential genes, which are more functionally dispensable and/or redundant. However, a recent survey of evolutionary distances between Saccharomyces cerevisiae and Caenorhabditis elegans proteins did not reveal any difference between the rates of evolution for essential and nonessential genes. An analysis of mouse and rat orthologous genes also found that essential and nonessential genes evolved at similar rates when genes thought to evolve under directional selection were excluded from the analysis. In the present study, we combine genomic sequence data with experimental knockout data to compare the rates of evolution and the levels of selection for essential versus nonessential bacterial genes. In contrast to the results obtained for eukaryotic genes, essential bacterial genes appear to be more conserved than are nonessential genes over both relatively short (microevolutionary) and longer (macroevolutionary) time scales.
- Published
- 2002
187. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens
- Author
-
L. Aravind, Nikolai Polushin, Vera V. Shakhova, Galina I. Belova, Roman L. Tatusov, Yuri I. Wolf, Kira S. Makarova, Olga V. Shcherbinina, Eugene V. Koonin, Darren A. Natale, Sergei A. Kozyavkin, Igor B. Rogozin, Alexei I. Slesarev, Katja V. Mezhevaya, Karl O. Stetter, and Andrei Malykh
- Subjects
Genetics ,Methanococcus ,Multidisciplinary ,biology ,Phylogenetic tree ,Base Sequence ,Molecular Sequence Data ,Euryarchaeota ,Biological Sciences ,biology.organism_classification ,Genome ,DNA sequencing ,Hyperthermophile ,genomic DNA ,Genome, Archaeal ,Operon ,Gene ,Phylogeny ,Archaea - Abstract
We have determined the complete 1,694,969-nt sequence of the GC-rich genome of Methanopyrus kandleri by using a whole direct genome sequencing approach. This approach is based on unlinking of genomic DNA with the ThermoFidelase version of M. kandleri topoisomerase V and cycle sequencing directed by 2′-modified oligonucleotides (Fimers). Sequencing redundancy (3.3×) was sufficient to assemble the genome with less than one error per 40 kb. Using a combination of sequence database searches and coding potential prediction, 1,692 protein-coding genes and 39 genes for structural RNAs were identified. M. kandleri proteins show an unusually high content of negatively charged amino acids, which might be an adaptation to the high intracellular salinity. Previous phylogenetic analysis of 16S RNA suggested that M. kandleri belonged to a very deep branch, close to the root of the archaeal tree. However, genome comparisons indicate that, in both trees constructed using concatenated alignments of ribosomal proteins and trees based on gene content, M. kandleri consistently groups with other archaeal methanogens. M. kandleri shares the set of genes implicated in methanogenesis and, in part, its operon organization with Methanococcus jannaschii and Methanothermobacter thermoautotrophicum . These findings indicate that archaeal methanogens are monophyletic. A distinctive feature of M. kandleri is the paucity of proteins involved in signaling and regulation of gene expression. Also, M. kandleri appears to have fewer genes acquired via lateral transfer than other archaea. These features might reflect the extreme habitat of this organism.
- Published
- 2002
188. Analysis of phylogenetically reconstructed mutational spectra in human mitochondrial DNA control region
- Author
-
Miroslava Derenko, V. B. Berikov, Igor B. Rogozin, and Boris Malyarchuk
- Subjects
Mitochondrial DNA ,DNA polymerase ,Population ,Molecular Sequence Data ,Human mitochondrial genetics ,DNA, Mitochondrial ,Genetics ,Humans ,education ,Genetics (clinical) ,Phylogeny ,Recombination, Genetic ,education.field_of_study ,biology ,Base Sequence ,Human evolutionary genetics ,Haplotype ,Genetic Variation ,Complementarity Determining Regions ,Human genetics ,Hypervariable region ,Pyrimidines ,Haplotypes ,Mutagenesis ,Mutation ,biology.protein - Abstract
Analysis of mutations in mitochondrial DNA is an important issue in population and evolutionary genetics. To study spontaneous base substitutions in human mitochondrial DNA we reconstructed the mutational spectra of the hypervariable segments I and II (HVS I and II) using published data on polymorphisms from various human populations. An excess of pyrimidine transitions was found both in HVS I and II regions. By means of classification analysis numerous mutational hotspots were revealed in these spectra. Context analysis of hotspots revealed a complex influence of neighboring bases on mutagenesis in the HVS I region. Further statistical analysis suggested that a transient misalignment dislocation mutagenesis operating in monotonous runs of nucleotides play an important role for generating base substitutions in mitochondrial DNA and define context properties of mtDNA. Our results suggest that dislocation mutagenesis in HVS I and II is a fingerprint of errors produced by DNA polymerase gamma in the course of human mitochondrial DNA replication
- Published
- 2002
189. Double-strand breaks in DNA during somatic hypermutation of Ig genes: cause or consequence?
- Author
-
Thomas A. Kunkel, Igor B. Rogozin, and Youri I. Pavlov
- Subjects
Double strand ,Genetics ,Mutation ,DNA damage ,Immunology ,Vdj recombination ,Models, Immunological ,Somatic hypermutation ,DNA ,Biology ,medicine.disease_cause ,chemistry.chemical_compound ,Mice ,chemistry ,medicine ,Immunology and Allergy ,Animals ,Humans ,Somatic Hypermutation, Immunoglobulin ,Gene ,DNA Damage - Published
- 2002
190. LINE-1 element in the vole Microtus subarvalis
- Author
-
Tatyana B. Nesterova, Suren M. Zakian, Nikolai G. Kholodilov, Vladimir Mayorov, Michael Mullokandov, Igor B. Rogozin, and Olga V. Cheryaukene
- Subjects
Genetics ,Transposable element ,Base Sequence ,Arvicolinae ,Molecular Sequence Data ,DNA ,Biology ,biology.organism_classification ,Genome ,genomic DNA ,DNA Transposable Elements ,Animals ,Direct repeat ,Vole ,ORFS ,Repeated sequence ,Microtus ,Repetitive Sequences, Nucleic Acid - Abstract
Highly repetitive DNA sequences of the LINE-1 (L1) family, which are interspersed in the genomes of various organisms, are among the many transposable elements deserving attention. L1 elements are widely distributed throughout the mammalian genomes (Burton et al. 1986). Their occurrence was first demonstrated in man and other mammals. Subsequently, L1like elements were found to occur in insects and higher plants (DiNosera and Sakaki 1990). No definite role has yet been assigned to L1 elements. Their presence in virtually all organisms suggests that they might be important for the structural organization and function of the genome. The sequence features of L 1 include two open reading frames (ORFs), the short 5' proximal ORF1 and the about four times longer 3' proximal ORF2. L1 element contains a poly(A) tail at the 3' end and is flanked by short direct repeats (Fanning and Singer 1987). The sequence similarity between L1 and retroviruses and related transposable elements suggests that the predicted ORF2 encoded protein is involved in the transpos i t ion of L1 e lement . The suppor t for the involvement of LI-encoded proteins in L1 transposition has come from in vitro studies (Leibold et al. 1990). More recently it was shown that human L1 element encodes reverse transcriptase activity (Dombroski et al. 1991; Mathias et al. 1991). We describe here genomic L1 element detected in voles of the Microtus genus. Genomic DNA was prepared from vole liver according to standard procedures (Henry et al. 1990); complete digestion was performed with endonuclease EcoRI. Comparative restriction analysis of genomic DNAs from five vole species is shown in Fig. 1. Clearly, the number and distribution of bands corresponding to DNA repeats are specific to each of the species. The intense band is common to the four species, and it represents the same
- Published
- 1993
191. Evolution of the mouse polyubiquitin-C gene
- Author
-
Fyodor A. Kondrashov, Andrey A. Perelygin, Margo A. Brinton, and Igor B. Rogozin
- Subjects
Molecular Sequence Data ,Chinese hamster ,Evolution, Molecular ,Mice ,Ubiquitin ,Cricetinae ,Genetics ,Animals ,Humans ,Crossing Over, Genetic ,Molecular Biology ,Gene ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,Recombination, Genetic ,Polymorphism, Genetic ,biology ,Base Sequence ,Polyubiquitin-C ,DNA ,Sequence Analysis, DNA ,biology.organism_classification ,Noncoding DNA ,Rats ,Gene Expression Regulation ,biology.protein ,Ubiquitin C ,Sequence Alignment ,Homogenization (biology) - Abstract
The polymeric ubiquitin (poly-u) genes are composed of tandem 228-bp repeats with no spacer sequences between individual monomer units. Ubiquitin is one of the most conserved proteins known to date, and the individual units within a number of poly-u genes are significantly more similar to each other than would be expected if each unit evolved independently. It has been proposed that the rather striking similarity among poly-u monomers in some lineages is caused by a series of homogenization events. Here we report the sequences of the polyubiquitin-C (Ubc) genes in two mouse strains. Analysis of these sequences, as well as those of the previously reported Chinese hamster and rat poly-u genes, supports the assertion that the homogenization of the ubiquitin-C gene in rodents is due to unequal crossing-over events. The sequence divergence of noncoding DNA was used to estimate the frequency of unequal crossing-over events (6.3 x 10(-5) events per generation) in the Ubc gene, as well as to provide evidence of apparent selection in the poly-u gene.
- Published
- 2001
192. Somatic mutation hotspots correlate with DNA polymerase eta error spectrum
- Author
-
Katarzyna Bebenek, Igor B. Rogozin, Youri I. Pavlov, Toshiro Matsuda, and Thomas A. Kunkel
- Subjects
DNA polymerase ,Base pair ,Immunology ,Amino Acid Motifs ,Molecular Sequence Data ,Somatic hypermutation ,DNA polymerase eta ,DNA-Directed DNA Polymerase ,medicine.disease_cause ,chemistry.chemical_compound ,Mice ,Consensus Sequence ,Consensus sequence ,medicine ,Immunology and Allergy ,Animals ,Humans ,Amino Acid Sequence ,Base Pairing ,Polymerase ,Genetics ,Mutation ,biology ,Base Sequence ,Genes, Immunoglobulin ,DNA ,Molecular biology ,chemistry ,biology.protein - Abstract
Mutational spectra analysis of 15 immunoglobulin genes suggested that consensus motifs RGYW and WA were universal descriptors of somatic hypermutation. Highly mutable sites, "hotspots", that matched WA were preferentially found in one DNA strand and RGYW hotspots were found in both strands. Analysis of base-substitution hotspots in DNA polymerase error spectra showed that 33 of 36 hotspots in the human polymerase eta spectrum conformed to the WA consensus. This and four other characteristics of polymerase eta substitution specificity suggest that errors introduced by this enzyme during synthesis of the nontranscribed DNA strand in variable regions may contribute to strand-specific somatic hypermutagenesis of immunoglobulin genes at A-T base pairs.
- Published
- 2001
193. Comparative study and prediction of DNA fragments associated with various elements of the nuclear matrix
- Author
-
Galina V. Glazko, Mikhail V. Glazkov, and Igor B. Rogozin
- Subjects
DNA, Plant ,Satellite DNA ,Molecular Sequence Data ,Biophysics ,Biology ,Biochemistry ,Combinatorics ,chemistry.chemical_compound ,Structural Biology ,Genetics ,Animals ,Humans ,Nuclear Matrix ,Sequence ,Base Sequence ,Oryza ,Mars Exploration Program ,DNA ,Sequence Analysis, DNA ,Telomere ,Linear discriminant analysis ,Nuclear matrix ,Globins ,Synaptonemal complex ,chemistry ,Nuclear lamina ,Edible Grain ,Software - Abstract
Scaffold/matrix-associated region (S/MAR) sequences are DNA regions that are attached to the nuclear matrix, and participate in many cellular processes. The nuclear matrix is a complex structure consisting of various elements. In this paper we compared frequencies of simple nucleotide motifs in S/MAR sequences and in sequences extracted directly from various nuclear matrix elements, such as nuclear lamina, cores of rosette-like structures, synaptonemal complex. Multivariate linear discriminant analysis revealed significant differences between these sequences. Based on this result we have developed a program, ChrClass (Win/NT version, ftp.bionet.nsc.ru/pub/biology/chrclass/chrclass.zip), for the prediction of the regions associated with various elements of the nuclear matrix in a query sequence. Subsequently, several test samples were analyzed by using two S/MAR prediction programs (a ChrClass and MAR-Finder) and a simple MRS criterion (S/MAR recognition signature) indicating the presence of S/MARs. Some overlap between the predictions of all MAR prediction tools has been found. Simultaneous use of the ChrClass, MRS criterion and MAR-Finder programs may help to obtain a more clearcut picture of S/MAR distribution in a query sequence. In general, our results suggest that the proportion of missed S/MARs is lower for ChrClass, whereas the proportion of wrong S/MARs is lower for MAR-Finder and MRS.
- Published
- 2001
194. Characterization of the Genomic Xist Locus in Rodents Reveals Conservation of Overall Gene Structure and Tandem Repeats but Rapid Evolution of Unique Sequence
- Author
-
Sergey Ya. Slobodyanyuk, Alexander I. Shevchenko, Nikolay N. Kolesnikov, Tatyana B. Nesterova, Marina E. Pavlova, Suren M. Zakian, Eugene A. Elisaphenko, Neil Brockdorff, Colette M. Johnston, and Igor B. Rogozin
- Subjects
Genetic Markers ,Male ,Letter ,RNA, Untranslated ,X Chromosome ,Transcription, Genetic ,Sequence analysis ,Molecular Sequence Data ,Locus (genetics) ,Animals, Wild ,Biology ,X-inactivation ,Conserved sequence ,Evolution, Molecular ,Mice ,Tandem repeat ,Genetics ,Animals ,Humans ,Gene ,3' Untranslated Regions ,Genetics (clinical) ,X chromosome ,Cells, Cultured ,Conserved Sequence ,Base Sequence ,Arvicolinae ,Chromosome Mapping ,DNA ,Genes ,Tandem Repeat Sequences ,XIST ,Female ,RNA, Long Noncoding ,5' Untranslated Regions ,Transcription Factors - Abstract
The Xist locus plays a central role in the regulation of X chromosome inactivation in mammals, although its exact mode of action remains to be elucidated. Evolutionary studies are important in identifying conserved genomic regions and defining their possible function. Here we report cloning, sequence analysis, and detailed characterization of the Xist gene from four closely related species of common vole (field mouse), Microtus arvalis. Our analysis reveals that there is overall conservation of Xistgene structure both between different vole species and relative to mouse and human Xist/XIST. Within transcribed sequence, there is significant conservation over five short regions of unique sequence and also over Xist-specific tandem repeats. The majority of unique sequences, however, are evolving at an unexpectedly high rate. This is also evident from analysis of flanking sequences, which reveals a very high rate of rearrangement and invasion of dispersed repeats. We discuss these results in the context of Xist gene function and evolution.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos.AJ310127–AJ310130 and AJ311670.]
- Published
- 2001
195. Cloning and functional analysis of SEL1L promoter region, a pancreas-specific gene
- Author
-
Giulia Malferrari, Loris Bernard, Claudio Sorio, Massimo Zollo, Monica Cattaneo, Igor B. Rogozin, Ida Biunno, and Aldo Scarpa
- Subjects
Transcription, Genetic ,TATA box ,Response element ,Molecular Sequence Data ,CAAT box ,Biology ,Mice ,SEL1L promoter ,Transcription (biology) ,Genes, Reporter ,Genetics ,Animals ,Pancreas-Specific Gene ,Cloning, Molecular ,Luciferases ,Promoter Regions, Genetic ,Molecular Biology ,Transcription factor ,Pancreas ,Base Sequence ,Activator (genetics) ,Intracellular Signaling Peptides and Proteins ,Proteins ,Promoter ,Cell Biology ,General Medicine ,DNA ,HNF1B ,Molecular biology ,pancreas - Abstract
We examined the promoter activity of SEL1L, the human ortholog of the C. elegans gene sel-1, a negative regulator of LIN-12/NOTCH receptor proteins. To understand the relation in SEL1L transcription pattern observed in different epithelial cells, we determined the transcription start site and sequenced the 5' flanking region. Sequence analysis revealed the presence of consensus promoter elements--GC boxes and a CAAT box--but the absence of a TATA motif. Potential binding sites for transcription factors that are involved in tissue-specific gene expression were identified, including: activator protein-2 (AP-2), hepatocyte nuclear factor-3 (HNF3 beta), homeobox Nkx2-5 and GATA-1. Transcription activity of the TATA-less SEL1L promoter was analyzed by transient transfection using luciferase reporter gene constructs. A core basal promoter of 302 bp was sufficient for constitutive promoter activity in all the cell types studied. This genomic fragment contains a CAAT and several GC boxes. The activity of the SEL1L promoter was considerably higher in mouse pancreatic beta cells (beta TC3) than in several human pancreatic neoplastic cell lines; an even greater reduction of its activity was observed in cells of nonpancreatic origin. These results suggest that SEL1L promoter may be a useful tool in gene therapy applications for pancreatic pathologies.
- Published
- 2001
196. Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context
- Author
-
Alexey S. Kondrashov, Eugene V. Koonin, Igor B. Rogozin, and Yuri I. Wolf
- Subjects
Comparative genomics ,Genetics ,Genome evolution ,Computational Biology ,Genome project ,Bacterial genome size ,Templates, Genetic ,Biology ,Genome ,Genes, Archaeal ,Evolution, Molecular ,Genes, Bacterial ,Genome, Archaeal ,Gene density ,Gene Order ,Operon ,Minimal genome ,Sequence Alignment ,Genetics (clinical) ,Conserved Sequence ,Genome, Bacterial ,Reference genome - Abstract
Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is observed can provide valuable evolutionary and functional clues through multiple genome comparisons. A program for constructing gapped local alignments of conserved gene strings in two genomes was developed. The statistical significance of the local alignments was assessed using Monte Carlo simulations. Sets of local alignments were generated for all pairs of completely sequenced bacterial and archaeal genomes, and for each genome a template-anchored multiple alignment was constructed. In most pairwise genome comparisons, Synechocystis sp to 24% for the minimal genome of Mycoplasma genitalium, and 23% in Thermotoga maritima. The coverage of the archaeal genomes was only slightly lower than that of bacterial genomes. The majority of the conserved gene strings are known operons, with the ribosomal superoperon being the top-scoring string in most genome comparisons. However, in some of the bacterial–archaeal pairs, the superoperon is rearranged to the extent that other operons, primarily those subject to horizontal transfer, show the greatest level of conservation, such as the archaeal-type H+-ATPase operon or ABC-type transport cassettes. The level of gene order conservation among prokaryotic genomes was compared to the cooccurrence of genomes in clusters of orthologous genes (COGs) and to the conservation of protein sequences themselves. Only limited correlation was observed between these evolutionary variables. Gene order conservation shows a much lower variance than the cooccurrence of genomes in COGs, which indicates that intragenome homogenization via recombination occurs in evolution much faster than intergenome homogenization via horizontal gene transfer and lineage-specific gene loss. The potential of using template-anchored multiple-genome alignments for predicting functions of uncharacterized genes was quantitatively assessed. Functions were predicted or significantly clarified for ∼90 COGs (∼4% of the total of 2414 analyzed COGs). The most significant predictions were obtained for the poorly characterized archaeal genomes; these include a previously uncharacterized restriction-modification system, a nuclease-helicase combination implicated in DNA repair, and the probable archaeal counterpart of the eukaryotic exosome. Multiple genome alignments are a resource for studies on operon rearrangement and disruption, which is central to our understanding of the evolution of prokaryotic genomes. Because of the rapid evolution of the gene order, the potential of genome alignment for prediction of gene functions is limited, but nevertheless, such predictions information significantly complements the results obtained through protein sequence and structure analysis.
- Published
- 2001
197. Mutagenic specificity of the base analog 6-N-hydroxylaminopurine in the LYS2 gene of yeast Saccharomyces cerevisiae
- Author
-
Vladimir V. Kulikov, Olga V. Tarunina, Vladimir N. Noskov, Igor B. Rogozin, Youri I. Pavlov, Yury O. Chernoff, and Irina L. Derkatch
- Subjects
Health, Toxicology and Mutagenesis ,Saccharomyces cerevisiae ,Nonsense mutation ,DNA Mutational Analysis ,Base analog ,chemistry.chemical_compound ,L-Aminoadipate-Semialdehyde Dehydrogenase ,Suppression, Genetic ,stomatognathic system ,Genetics ,Directionality ,Molecular Biology ,Gene ,biology ,Adenine ,Nucleic acid sequence ,biology.organism_classification ,Molecular biology ,Aldehyde Oxidoreductases ,Antisense Elements (Genetics) ,Phenotype ,chemistry ,Codon, Nonsense ,Mutagenesis ,Release factor ,DNA ,Mutagens - Abstract
We used the LYS 2 gene mutational system to study mutation specificity of the base analog 6- N -hydroxylaminopurine (HAP) in yeast. We characterized phenotypes of mutations using codon-specific nonsense suppressors and the test employing inactivation of the release factor Sup35 due to overexpression and formation of prion-like derivative [ PSI ]. We have shown that HAP induces predominantly nonsense mutations. While the tests using codon-specific nonsense-suppressors allowed to identify only about 50% of nonsense-mutations, all the nonsense-mutations were identified in the test with defective Sup35. We determined and analyzed the spectrum of HAP-induced nucleotide changes in two regions of the gene. HAP induces predominantly GC→AT transitions in a hotspots of a central position of trinucleotide GGA or AGG. Directionality of these transitions is consistent with the idea that initial dHAPMP incorporation in the leading strand is more genetically dangerous than in lagging DNA strand. We revealed a specific context inhibitory for HAP mutagenesis, a “T” in −1 position to mutation site.
- Published
- 2001
198. Similarity pattern analysis in mutational distributions
- Author
-
Nikita N. Khromov-Borisov, Frederick J. de Serres, João Antonio Pêgas Henriques, and Igor B. Rogozin
- Subjects
DNA, Bacterial ,Databases, Factual ,Health, Toxicology and Mutagenesis ,DNA Mutational Analysis ,Pattern analysis ,Biology ,Mutational spectra ,Homogeneous clusters ,Similarity (network science) ,Genetics ,Escherichia coli ,Point Mutation ,Molecular Biology ,Sequence Deletion ,Contingency table ,Homogeneity (statistics) ,X-Rays ,Ms analysis ,Chromosome Mapping ,Computational Biology ,Lac Operon ,Mutagenesis ,Mutation ,Sufficient statistic ,Mutagens - Abstract
The validity and applicability of the statistical procedure — similarity pattern analysis (SPAN) — to the study of mutational distributions (MDs) was demonstrated with two sets of data. The first was mutational spectra (MS) for 697 GC to AT transitions produced with eight alkylating agents (AAs) in the lacI gene of Escherichia coli . The second was a recently summarized data on the distributions of 11562 spontaneous, radiation- and chemical-induced forward mutations in the ad-3 region of heterokaryon 12 of Neurospora crassa . They were analyzed as large two-way contingency tables (CTs) where two kinds of profiles were compared: site (or genotypic class) profiles and origin (or mutagen) profiles. To measure similarity (homogeneity) between any pair of profiles, the relevant sufficient statistics, Kastenbaum–Hirotsu squared distance ( KHi 2 ), was used. Collapsing the similar profiles into distinct internally homogeneous clusters named ‘collapsets’ revealed their similarity pattern. To facilitate the procedure, the computer program, COLLAPSE, was elaborated. The results of SPAN for the lacI spectra were found comparable with the results of their previous analysis with two multivariate statistical methods, the factor and cluster analyses. In the ad-3 data set, five collapsets were revealed among origin profiles (OPs): (I) ENU=4NQO=4HAQO=FANFT=SQ18506; (II) AF-2=EI=MMS=DEP; (III) ETO=UV; (IV) AHA=PROCARB; and (V) He ions=protons. Moreover, the previous observation that MDs are dose-dependent was confirmed for X-ray-induced MDs. Profiles induced with the low doses of X-rays are similar to that induced with 85 Sr, and profiles induced with the medium X-ray doses to those induced with protons and He ions. Evaluated similarities appear to be rather reasonable: mutagens with similar mode of action induce similar MDs. Similarity pattern revealed among genotypic class profiles (GCPs) seems to be also interpretable. When supplemented with descriptive cluster analysis, SPAN appears to be a fruitful methodology in MS analysis.
- Published
- 1999
199. GeneBuilder: interactive in silico prediction of gene structure
- Author
-
Igor B. Rogozin, Luciano Milanesi, and D. D'Angelo
- Subjects
Statistics and Probability ,Databases, Factual ,Gene prediction ,In silico ,Molecular Sequence Data ,Genomics ,Biology ,Biochemistry ,Homology (biology) ,User-Computer Interface ,Coding region ,Animals ,Humans ,Amino Acid Sequence ,Codon ,Molecular Biology ,Gene ,Genetics ,Expressed Sequence Tags ,Expressed sequence tag ,Internet ,Binding Sites ,Base Sequence ,DNA ,TATA Box ,Computer Science Applications ,Computational Mathematics ,Nested gene ,Computational Theory and Mathematics ,Genes ,CpG Islands ,Sequence Alignment ,Software ,Transcription Factors - Abstract
MOTIVATION: Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. RESULTS: We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. AVAILABILITY: The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
- Published
- 1999
200. Regression trees for analysis of mutational spectra in nucleotide sequences
- Author
-
Igor B. Rogozin and V. B. Berikov
- Subjects
Statistics and Probability ,Alkylating Agents ,DNA Mutational Analysis ,Context (language use) ,Biology ,medicine.disease_cause ,Biochemistry ,medicine ,Escherichia coli ,Nucleotide ,Molecular Biology ,Sequence (medicine) ,chemistry.chemical_classification ,Genetics ,Mutation ,Base Sequence ,Models, Genetic ,Nucleic acid sequence ,Regression analysis ,Models, Theoretical ,Regression ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,chemistry ,Lac Operon ,Ethylnitrosourea ,Regression Analysis ,Algorithms ,Software - Abstract
MOTIVATION: The study and comparison of mutational spectra is an important problem in molecular biology, because these spectra often reveal important features of the action of various mutagens and the functioning of repair/replication enzymes. As is known, mutability varies significantly along nucleotide sequences: mutations often concentrate at certain positions in a sequence, otherwise termed 'hotspots'. RESULTS: Herein, we propose a regression analysis method based on the use of regression trees in order to analyse the influence of nucleotide context on the occurrence of such hotspots. The REGRT program developed has been tested on simulated and real mutational spectra. For the G:C-->T:A mutational spectra induced by Sn1 alkylating agents (nine spectra), the prediction accuracy was 0. 99. AVAILABILITY: The REGRT program is available upon request from V.Berikov.
- Published
- 1999
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.