373 results on '"Mikhail S. Gelfand"'
Search Results
202. Segmentation of yeast DNA using hidden Markov models
- Author
-
Leonid Peshkin and Mikhail S. Gelfand
- Subjects
Statistics and Probability ,Saccharomyces cerevisiae ,Biology ,Biochemistry ,Intergenic region ,Sliding window protocol ,Segmentation ,Hidden Markov model ,Molecular Biology ,Genetics ,Models, Statistical ,Models, Genetic ,Markov chain ,business.industry ,Small number ,Pattern recognition ,DNA ,Sequence Analysis, DNA ,Quantitative Biology::Genomics ,Markov Chains ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Artificial intelligence ,business ,Algorithms ,Level of detail ,Natural language ,Hydrogen - Abstract
Motivation: Compositionally homogeneous segments of genomic DNA often correspond to meaningful biological units. Simple sliding window analysis is usually insufficient for compositional segmentation of natural sequences. Hidden Markov models (HMM) with a small number of states are a natural language for description of compositional properties of chromosome-size DNA sequences. Results: The algorithms were applied to yeast Saccharomyces cerevisiae chromosomes (YC) I, III, IV, VI and IX. The optimal number of HMM states is found to be four. The optimal four-state HMMs for all chromosomes are very similar, as well as the reconstructed segmentations. In most cases the models with k + 1 states are obtained by ‘splitting’ one of the states in the model with k states, and the corresponding increase of the level of detail in segmentation. The high AT states usually correspond to intergenic regions. We also explore the model’s likelihood landscape and analyze the dynamics of the optimization process, thus addressing the problem of reliability of the obtained optima and efficiency of the algorithms. Availability: The system is available on request from the first author. Contact: ldp@cs.brown.edu
- Published
- 1999
203. Frequent Alternative Splicing of Human Genes
- Author
-
Mikhail S. Gelfand, James W. Fickett, and Andrey A. Mironov
- Subjects
Expressed Sequence Tags ,Genetics ,Letter ,Base Sequence ,Databases, Factual ,Molecular Sequence Data ,Alternative splicing ,Intron ,Sequence alignment ,Exons ,Biology ,Introns ,Alternative Splicing ,Contig Mapping ,Exon ,RNA splicing ,Humans ,Protein Isoforms ,Coding region ,Human genome ,5' Untranslated Regions ,Sequence Alignment ,Gene ,Genetics (clinical) - Abstract
Alternative splicing can produce variant proteins and expression patterns as different as the products of different genes, yet the prevalence of alternative splicing has not been quantified. Here the spliced alignment algorithm was used to make a first inventory of exon-intron structures of known human genes using EST contigs from the TIGR Human Gene Index. The results on any one gene may be incomplete and will require verification, yet the overall trends are significant. Evidence of alternative splicing was shown in 35% of genes and the majority of splicing events occurred in 5′ untranslated regions, suggesting wide occurrence of alternative regulation. Most of the alternative splices of coding regions generated additional protein domains rather than alternating domains.
- Published
- 1999
204. Statistical Analysis of the Exon-Intron Structure of Higher and Lower Eukaryote Genes
- Author
-
Mikhail S. Gelfand and E. V. Kriventseva
- Subjects
Databases, Factual ,RNA Splicing ,Arabidopsis ,Saccharomyces cerevisiae ,Exon shuffling ,Exon ,Structural Biology ,Animals ,Humans ,splice ,Molecular Biology ,Gene ,Genetics ,Models, Statistical ,Splice site mutation ,Models, Genetic ,biology ,Intron ,Exons ,Sequence Analysis, DNA ,General Medicine ,biology.organism_classification ,Introns ,Aspergillus ,Eukaryotic Cells ,Pyrimidines ,RNA splicing ,Eukaryote ,Apicomplexa ,Algorithms - Abstract
Statistics of the exon-intron structure and splicing sites of several diverse eukaryotes was studied. The yeast exon-intron structures have a number of unique features. A yeast gene usually have at most one intron. The branch site is strongly conserved, whereas the polypirimidine tract is short. Long yeast introns tend to have stronger acceptor sites. In other species the branch site is less conserved and often cannot be determined. In non-yeast samples there is an almost universal correlation between lengths of neighboring exons (all samples excluding protists) and correlation between lengths of neighboring introns (human, drosophila, protists). On the average first introns are longer, and anomalously long introns are usually first introns in a gene. There is a universal preference for exons and exon pairs with the (total) length divisible by 3. Introns positioned between codons are preferred, whereas those positioned between the first and second positions in codon are avoided. The choice of A or G at the third position of intron (the donor splice sites generally prefer purines at this position) is correlated with the overall GC-composition of the gene. In all samples dinucleotide AG is avoided in the region preceding the acceptor site.
- Published
- 1999
205. Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes
- Author
-
Mikhail A. Roytberg, Mikhail S. Gelfand, Andrey A. Mironov, and Eugene V. Koonin
- Subjects
Transcription, Genetic ,Operon ,Bacterial genome size ,Biology ,Arginine ,Response Elements ,Regulon ,Conserved sequence ,Bacterial Proteins ,Escherichia coli ,Genetics ,Gene ,Conserved Sequence ,Phylogeny ,Purr ,Binding Sites ,Arginine transport ,Base Sequence ,Tryptophan ,Computational Biology ,Gene Expression Regulation, Bacterial ,Haemophilus influenzae ,Genes, Bacterial ,Purines ,Regulatory sequence ,Tyrosine ,Carrier Proteins ,Genome, Bacterial ,Research Article - Abstract
Recognition of transcription regulation sites (operators) is a hard problem in computational molecular biology. In most cases, small sample size and low degree of sequence conservation preclude the construction of reliable recognition rules. We suggest an approach to this problem based on simultaneous analysis of several related genomes. It appears that as long as a gene coding for a transcription regulator is conserved in the compared bacterial genomes, the regulation of the respective group of genes (regulons) also tends to be maintained. Thus a gene can be confidently predicted to belong to a particular regulon in case not only itself, but also its orthologs in other genomes have candidate operators in the regulatory regions. This provides for a greater sensitivity of operator identification as even relatively weak signals are likely to be functionally relevant when conserved. We use this approach to analyze the purine (PurR), arginine (ArgR) and aromatic amino acid (TrpR and TyrR) regulons of Escherichia coli and Haemophilus influenzae. Candidate binding sites in regulatory regions of the respective H.influenzae genes are identified, a new family of purine transport proteins predicted to belong to the PurR regulon is described, and probable regulation of arginine transport by ArgR is demonstrated. Differences in the regulation of some orthologous genes in E.coli and H.influenzae, in particular the apparent lack of the autoregulation of the purine repressor gene in H.influenzae, are demonstrated.
- Published
- 1999
206. Starts of bacterial genes: estimating the reliability of computer predictions
- Author
-
Andrei V. Mironov, Mikhail S. Gelfand, and Dmitrij Frishman
- Subjects
DNA, Bacterial ,Codon, Initiator ,Computational biology ,Biology ,Genome ,Evolution, Molecular ,Eukaryotic translation ,Start codon ,Genetics ,Gene ,Phylogeny ,Binding Sites ,Base Sequence ,Phylogenetic tree ,Reproducibility of Results ,Shine-Dalgarno sequence ,General Medicine ,Ribosomal RNA ,Ribosomal binding site ,RNA, Bacterial ,Genes, Bacterial ,RNA, Ribosomal ,Ribosomes ,Sequence Alignment ,Algorithms ,Software - Abstract
Exact mapping of gene starts is an important problem in the computer-assisted functional analysis of newly sequenced prokaryotic genomes. We describe an algorithm for finding ribosomal binding sites without a learning sample. This algorithm is particularly useful for analysis of genomes with little or no experimentally mapped genes. There is a clear correlation between the ribosomal binding site (RBS) properties of a given genome and the potential gene start prediction accuracy. This correlation is of considerable predictive power and may be useful for estimating the expected success of future genome analysis efforts. We also demonstrate that the RBS properties depend on the phylogenetic position of a genome.
- Published
- 1999
207. 20 Active chromatin regions are sufficient to define borders of topologically associated domains inD. melanogasterinterphase chromosomes
- Author
-
Mikhail S. Gelfand, Yuri Y. Shevelyov, Sergey V. Ulyanov, Alexey A. Gavrilov, Ekaterina Khrameeva, and Sergey V. Razin
- Subjects
biology ,Structural Biology ,Melanogaster ,Interphase ,General Medicine ,Drosophila (subgenus) ,biology.organism_classification ,Molecular Biology ,Chromatin ,Domain (software engineering) ,Cell biology - Abstract
In Drosophila, interphase chromosomes are organized in topologically associated domains (TADs) within which chromatin-chromatin interactions are frequent, while interactions across domain borders a...
- Published
- 2015
208. Combining diverse evidence for gene recognition in completely sequenced bacterial genomes
- Author
-
Hans-Werner Mewes, Andrey A. Mironov, Dmitrij Frishman, and Mikhail S. Gelfand
- Subjects
Genetics ,Candidate gene ,Databases, Factual ,Gene prediction ,Codon, Initiator ,Sequence alignment ,Sequence Analysis, DNA ,Bacterial genome size ,Biology ,Genome ,Open Reading Frames ,Open reading frame ,Bacterial Proteins ,GenBank ,parasitic diseases ,Escherichia coli ,Sequence Alignment ,Gene ,Algorithms ,Genome, Bacterial ,Software ,Research Article ,Bacillus subtilis - Abstract
Analysis of a newly sequenced bacterial genome starts with identification of protein-coding genes. Functional assignment of proteins requires the exact knowledge of protein N-termini. We present a new program ORPHEUS that identifies candidate genes and accurately predicts gene starts. The analysis starts with a database similarity search and identification of reliable gene fragments. The latter are used to derive statistical characteristics of protein-coding regions and ribosome-binding sites and to predict the complete set of genes in the analyzed genome. In a test on Bacillus subtilis and Escherichia coli genomes, the program correctly identified 93.3% (resp. 96.3%) of experimentally annotated genes longer than 100 codons described in the PIR-International database, and for these genes 96.3% (83.9%) of starts were predicted exactly. Furthermore, 98.9% (99.1%) of genes longer than 100 codons annotated in GenBank were found, and 92.9% (75.7%) of predicted starts coincided with the feature table description. Finally, for the complete gene complements of B.subtilis and E.coli , including genes shorter than 100 codons, gene prediction accuracy was 88.9 and 87.1%, respectively, with 94.2 and 76.7% starts coinciding with the existing annotation.
- Published
- 1998
209. Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes
- Author
-
Mikhail S. Gelfand and Eugene V. Koonin
- Subjects
Methanococcus ,Chloroplasts ,Biology ,Cyanobacteria ,Genome ,DNA Restriction-Modification Enzymes ,Mycoplasma ,Plasmid ,Genetics ,Prophage ,Palindromic sequence ,Organelles ,Bacteria ,Base Sequence ,Models, Genetic ,biology.organism_classification ,Archaea ,Haemophilus influenzae ,Mitochondria ,Restriction enzyme ,Mobile genetic elements ,Genome, Bacterial ,Mathematics ,Research Article - Abstract
Short palindromic sequences (4, 5 and 6 bp palindromes) are avoided at a statistically significant level in the genomes of several bacteria, including the completely sequenced Haemophilus influenzae and Synechocystis sp. genomes and in the complete genome of the archaeon Methanococcus jannaschii. In contrast, there is only moderate avoidance of palindromes in the small genome of the bacterium Mycoplasma genitalium and no detectable avoidance in the genomes of chloroplasts and mitochondria. The sites for type II restriction-modification enzymes detected in the given species tend to be among the most avoided palindromes in a particular genome, indicating a direct connection between the avoidance of short oligonucleotide words and restriction-modification systems with the respective specificity. Palindromes corresponding to sites for restriction enzymes from other species are also avoided, albeit less significantly, suggesting that in the course of evolution bacterial DNA has been exposed to a wide spectrum of restriction enzymes, probably as the result of lateral transfer mediated by mobile genetic elements, such as plasmids and prophages. Palindromic words appear to accumulate in DNA once it becomes isolated from restriction-modification systems, as demonstrated by the case of organellar genomes. By combining these observations with protein sequence analysis, we show that the most avoided 4-palindrome and the most avoided 6-palindrome in the archaeon M.jannaschii are likely to be recognition sites for two novel restriction-modification systems.
- Published
- 1997
210. A Novel Intra-U1 snRNP Cross-Regulation Mechanism: Alternative Splicing Switch Links U1C and U1-70K Expression
- Author
-
Ekaterina Khrameeva, Patrick Le Querrec, Tanja Dorothe Rösel-Hillgärtner, Lee-Hsueh Hung, Mikhail S. Gelfand, and Albrecht Bindereif
- Subjects
Cancer Research ,Embryo, Nonmammalian ,lcsh:QH426-470 ,RNA Splicing ,DNA Mutational Analysis ,Biology ,Ribonucleoprotein, U1 Small Nuclear ,Protein splicing ,Genetics ,RNA Precursors ,Animals ,Humans ,snRNP ,Amino Acid Sequence ,Molecular Biology ,Genetics (clinical) ,Ecology, Evolution, Behavior and Systematics ,Zebrafish ,Ribonucleoprotein ,Binding Sites ,Base Sequence ,Alternative splicing ,Intron ,Molecular biology ,Cell biology ,lcsh:Genetics ,Alternative Splicing ,Gene Knockdown Techniques ,RNA splicing ,Spliceosomes ,RNA Splice Sites ,Small nuclear ribonucleoprotein ,Minigene ,Research Article ,HeLa Cells - Abstract
The U1 small nuclear ribonucleoprotein (snRNP)-specific U1C protein participates in 5′ splice site recognition and regulation of pre-mRNA splicing. Based on an RNA-Seq analysis in HeLa cells after U1C knockdown, we found a conserved, intra-U1 snRNP cross-regulation that links U1C and U1-70K expression through alternative splicing and U1 snRNP assembly. To investigate the underlying regulatory mechanism, we combined mutational minigene analysis, in vivo splice-site blocking by antisense morpholinos, and in vitro binding experiments. Alternative splicing of U1-70K pre-mRNA creates the normal (exons 7–8) and a non-productive mRNA isoform, whose balance is determined by U1C protein levels. The non-productive isoform is generated through a U1C-dependent alternative 3′ splice site, which requires an adjacent cluster of regulatory 5′ splice sites and binding of intact U1 snRNPs. As a result of nonsense-mediated decay (NMD) of the non-productive isoform, U1-70K mRNA and protein levels are down-regulated, and U1C incorporation into the U1 snRNP is impaired. U1-70K/U1C-deficient particles are assembled, shifting the alternative splicing balance back towards productive U1-70K splicing, and restoring assembly of intact U1 snRNPs. Taken together, we established a novel feedback regulation that controls U1-70K/U1C homeostasis and ensures correct U1 snRNP assembly and function., Author Summary The accurate removal of intervening sequences (introns) from precursor messenger RNAs (pre-mRNAs) represents an essential step in the expression of most eukaryotic protein-coding genes. Alternative splicing can create from a single primary transcript various mature mRNAs with diverse, sometimes even antagonistic, biological functions. Many human diseases are based on alternative-splicing defects, and most interestingly, certain defects are caused by mutations in general splicing factors that participate in each splicing event. To address the question of how a general splicing factor can regulate alternative splicing events, here we investigated the regulatory role of the U1C protein, a specific component of the U1 small nuclear ribonucleoprotein (snRNP) and important in initial 5′ splice site recognition. Our RNA-Seq analysis demonstrated that U1C affects more than 300 cases of alternative splicing in the human system. One U1C target, U1-70K, appeared to be particularly interesting, because both protein products are components of the U1 snRNP and functionally depend on each other. Analyzing the mechanistic basis of this intra-U1 snRNP cross-regulation, we discovered a U1C-dependent alternative splicing switch in the U1-70K pre-mRNA that regulates U1-70K expression. In sum, this feedback loop controls and links U1C and U1-70K homeostasis to guarantee correct U1 snRNP assembly and function.
- Published
- 2013
211. The Spore Differentiation Pathway in the Enteric Pathogen Clostridium difficile
- Author
-
Marc Monot, Bruno Dupuy, Isabelle Martin-Verstraete, Olga Soutourina, Pavel V. Shelyakin, Mónica Serrano, Laure Saujet, Fátima C. Pereira, Mikhail S. Gelfand, Adriano O. Henriques, Cellule Pasteur, Université Paris Diderot - Paris 7 (UPD7)-PRES Sorbonne Paris Cité, Pathogénèse des Bactéries Anaérobies / Pathogenesis of Bacterial Anaerobes (PBA (U-Pasteur_6)), Institut Pasteur [Paris]-Université Paris Diderot - Paris 7 (UPD7), Universidade Nova de Lisboa = NOVA University Lisbon (NOVA), Institute for Information Transmission Problems (IITP), Russian Academy of Sciences [Moscow] (RAS), Faculty of Biengineering and Bioinformatics [Moscow], Lomonosov Moscow State University (MSU), This work was supported by grants ERA-PTG/SAU/0002/2008 (ERA-NET PathoGenoMics) to AOH and BD, and Pest-C/EQB/LA0006/2011 from the 'Fundação para a Ciência e a Tecnologia' (FCT) to AOH, FCP (SFRH/BD/45459/08) and MS (SFRH/BPD/36328/2007) were the recipient of a doctoral and a post-doctoral fellowship, respectively, from the FCT. MSG and PVS were partially supported by the Russian Academy of Sciences via program in Molecular and Cellular Biology, RFBR grant 12-04-91332 and State Contracts No. 8049 and 8283., European Project: 109584,FCT::,ERA-PTG/2008,ERA-PTG/SAU/0002/2008(2009), Institut Pasteur [Paris] (IP)-Université Paris Diderot - Paris 7 (UPD7), PRES Sorbonne Paris Cité-Université Paris Diderot - Paris 7 (UPD7), and Universidade Nova de Lisboa (NOVA)
- Subjects
Diarrhea ,Cancer Research ,lcsh:QH426-470 ,Transcription, Genetic ,Cellular differentiation ,Sigma Factor ,Bacillus subtilis ,Evolution, Molecular ,03 medical and health sciences ,Sigma factor ,Gene expression ,Genetics ,Humans ,Promoter Regions, Genetic ,Molecular Biology ,Gene ,Genetics (clinical) ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Regulation of gene expression ,Spores, Bacterial ,0303 health sciences ,biology ,030306 microbiology ,Clostridioides difficile ,fungi ,[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Molecular biology ,Cell Differentiation ,Gene Expression Regulation, Bacterial ,Clostridium difficile ,biology.organism_classification ,lcsh:Genetics ,Regulon ,Genome, Bacterial ,Research Article ,Protein Binding - Abstract
Clostridium difficile, a Gram positive, anaerobic, spore-forming bacterium is an emergent pathogen and the most common cause of nosocomial diarrhea. Although transmission of C. difficile is mediated by contamination of the gut by spores, the regulatory cascade controlling spore formation remains poorly characterized. During Bacillus subtilis sporulation, a cascade of four sigma factors, σF and σG in the forespore and σE and σK in the mother cell governs compartment-specific gene expression. In this work, we combined genome wide transcriptional analyses and promoter mapping to define the C. difficile σF, σE, σG and σK regulons. We identified about 225 genes under the control of these sigma factors: 25 in the σF regulon, 97 σE-dependent genes, 50 σG-governed genes and 56 genes under σK control. A significant fraction of genes in each regulon is of unknown function but new candidates for spore coat proteins could be proposed as being synthesized under σE or σK control and detected in a previously published spore proteome. SpoIIID of C. difficile also plays a pivotal role in the mother cell line of expression repressing the transcription of many members of the σE regulon and activating sigK expression. Global analysis of developmental gene expression under the control of these sigma factors revealed deviations from the B. subtilis model regarding the communication between mother cell and forespore in C. difficile. We showed that the expression of the σE regulon in the mother cell was not strictly under the control of σF despite the fact that the forespore product SpoIIR was required for the processing of pro-σE. In addition, the σK regulon was not controlled by σG in C. difficile in agreement with the lack of pro-σK processing. This work is one key step to obtain new insights about the diversity and evolution of the sporulation process among Firmicutes., Author Summary Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces resistant spores that facilitate the persistence of this bacterium in the environment including hospitals. Its transmission is mediated by contamination of gut by spores. Understanding how this complex developmental process is regulated is fundamental to decipher C. difficile transmission and pathogenesis. The regulatory cascade controlling sporulation that involves four sigma factors, σF and σG in the forespore and σE and σK in the mother cell remains poorly characterized in C. difficile. By combining transcriptome analysis and promoter mapping, we identified genes expressed under the specific control of each sigma factor. Among sporulation-controlled proteins detected in spore, we can propose candidates for new spore coat proteins important for spore resistance. We also showed differences in the intercompartment communication between forespore and mother cell in C. difficile compared to the Bacillus subtilis model. In C. difficile, we observed that the activation of the σE regulon was partially independent of σF and that the σK regulon was not controlled by σG. Our finding suggests that the C. difficile sporulation process might be more ancestral compared to that of B. subtilis.
- Published
- 2013
212. What is to be done about Russian science?
- Author
-
Mikhail S. Gelfand
- Subjects
Politics ,Multidisciplinary ,History ,Science ,Academies and Institutes ,Library science ,Federal Government ,Research management ,Research Personnel ,Russia - Published
- 2013
213. Functional implications of splicing polymorphisms in the human genome
- Author
-
Sergey Naumenko, Mikhail S. Gelfand, Georgii A. Bazykin, Yerbol Z. Kurmangaliyev, and Roman A. Sutormin
- Subjects
Protein Conformation ,RNA Splicing ,Single-nucleotide polymorphism ,Biology ,Genome ,Polymorphism, Single Nucleotide ,Evolution, Molecular ,Exon ,Mice ,Databases, Genetic ,Genetics ,Animals ,Humans ,splice ,Molecular Biology ,Gene ,Genetics (clinical) ,Splice site mutation ,Genome, Human ,Genetic Variation ,Proteins ,General Medicine ,RNA splicing ,Human genome ,RNA Splice Sites ,Sequence Alignment - Abstract
Proper splicing is often crucial for gene functioning and its disruption may be strongly deleterious. Nevertheless, even the essential for splicing canonical dinucleotides of the splice sites are often polymorphic. Here, we use data from The 1000 Genomes Project to study single-nucleotide polymorphisms (SNPs) in the canonical dinucleotides. Splice sites carrying SNPs are enriched in weakly expressed genes and in rarely used alternative splice sites. Genes with disrupted splice sites tend to have low selective constraint, and the splice sites disrupted by SNPs are less likely to be conserved in mouse. Furthermore, SNPs are enriched in splice sites whose effects on gene function are minor: splice sites located outside of protein-coding regions, in shorter exons, closer to the 3'-ends of proteins, and outside of functional protein domains. Most of these effects are more pronounced for high-frequency SNPs. Despite these trends, many of the polymorphic sites may still substantially affect the function of the corresponding genes. A number of the observed splice site-disrupting SNPs, including several high-frequency ones, were found among mutations described in OMIM.
- Published
- 2013
214. Widespread splicing changes in human brain development and aging
- Author
-
Yi-Ping Phoebe Chen, Wei Chen, Zheng Yan, Hongyi Yang, Ning Fu, Liu He, Philipp Khaitovich, Zhibin Ning, Yuan Yuan, Mehmet Somel, Xiling Liu, Pavel V. Mazin, Jieyi Xiong, Rong Zeng, Na Li, Mingshuang Li, Mikhail S. Gelfand, Yuhui Hu, and Xiaoyu Zhang
- Subjects
Adult ,Aging ,Adolescent ,brain ,RNA Splicing ,Prefrontal Cortex ,RNA-Seq ,Biology ,General Biochemistry, Genetics and Molecular Biology ,Article ,Exon ,alternative splicing ,Young Adult ,Cerebellum ,medicine ,Animals ,Humans ,human ,Prefrontal cortex ,Child ,skin and connective tissue diseases ,Gene ,development ,Aged ,Genetics ,Aged, 80 and over ,General Immunology and Microbiology ,Sequence Analysis, RNA ,Applied Mathematics ,Gene Expression Profiling ,Alternative splicing ,Infant, Newborn ,Infant ,Proteins ,Human brain ,Exons ,Middle Aged ,Macaca mulatta ,Gene expression profiling ,medicine.anatomical_structure ,Computational Theory and Mathematics ,Cardiovascular and Metabolic Diseases ,Child, Preschool ,RNA splicing ,sense organs ,RNA-seq ,General Agricultural and Biological Sciences ,Information Systems - Abstract
Human brain transcriptome analysis revealed widespread age-related splicing changes in the prefrontal cortex and cerebellum. While most of the splicing changes take place in development, approximately one-third of them extends into aging., More than one-third of genes expressed in the human brain change splicing with age. Approximately 30% of observed splicing changes occur in aging. Age-related splicing patterns are largely conserved between the human and macaque brains. High frequency of intron retention events suggests the role of nonsense-mediated decay in age-related gene expression regulation., While splicing differences between tissues, sexes and species are well documented, little is known about the extent and the nature of splicing changes that take place during human or mammalian development and aging. Here, using high-throughput transcriptome sequencing, we have characterized splicing changes that take place during whole human lifespan in two brain regions: prefrontal cortex and cerebellum. Identified changes were confirmed using independent human and rhesus macaque RNA-seq data sets, exon arrays and PCR, and were detected at the protein level using mass spectrometry. Splicing changes across lifespan were abundant in both of the brain regions studied, affecting more than a third of the genes expressed in the human brain. Approximately 15% of these changes differed between the two brain regions. Across lifespan, splicing changes followed discrete patterns that could be linked to neural functions, and associated with the expression profiles of the corresponding splicing factors. More than 60% of all splicing changes represented a single splicing pattern reflecting preferential inclusion of gene segments potentially targeting transcripts for nonsense-mediated decay in infants and elderly.
- Published
- 2013
215. Gene recognition via spliced sequence alignment
- Author
-
Mikhail S. Gelfand, Pavel A. Pevzner, and Andrey A. Mironov
- Subjects
Genetics ,Time Factors ,Multidisciplinary ,Databases, Factual ,RNA Splicing ,Alternative splicing ,Computational gene ,Sequence alignment ,Exons ,Models, Theoretical ,Biology ,Biological Evolution ,Pattern Recognition, Automated ,Exon ,Genes ,Codon usage bias ,RNA splicing ,Animals ,Humans ,Human genome ,Mathematical Computing ,Sequence Alignment ,Gene ,Algorithms ,Research Article - Abstract
Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.
- Published
- 1996
216. Functional diversification of ROK-family transcriptional regulators of sugar catabolism in the Thermotogae phylum
- Author
-
Andrei L. Osterman, Mikhail S. Gelfand, Xiaoqing Li, Dmitry A. Rodionov, and Marat D. Kazanov
- Subjects
Genetics ,DNA, Bacterial ,Binding Sites ,biology ,Bacteria ,Repressor ,Gene Regulation, Chromatin and Epigenetics ,biology.organism_classification ,Regulon ,Open reading frame ,Bacterial Proteins ,Phylogenetics ,Thermotoga maritima ,Transcriptional regulation ,Carbohydrate Metabolism ,Nucleotide Motifs ,Gene ,Transcription factor ,Genome, Bacterial ,Phylogeny ,Transcription Factors - Abstract
Large and functionally heterogeneous families of transcription factors have complex evolutionary histories. What shapes specificities toward effectors and DNA sites in paralogous regulators is a fundamental question in biology. Bacteria from the deep-branching lineage Thermotogae possess multiple paralogs of the repressor, open reading frame, kinase (ROK) family regulators that are characterized by carbohydrate-sensing domains shared with sugar kinases. We applied an integrated genomic approach to study functions and specificities of regulators from this family. A comparative analysis of 11 Thermotogae genomes revealed novel mechanisms of transcriptional regulation of the sugar utilization networks, DNA-binding motifs and specific functions. Reconstructed regulons for seven groups of ROK regulators were validated by DNA-binding assays using purified recombinant proteins from the model bacterium Thermotoga maritima. All tested regulators demonstrated specific binding to their predicted cognate DNA sites, and this binding was inhibited by specific effectors, mono- or disaccharides from their respective sugar catabolic pathways. By comparing ligand-binding domains of regulators with structurally characterized kinases from the ROK family, we elucidated signature amino acid residues determining sugar-ligand regulator specificity. Observed correlations between signature residues and the sugar-ligand specificities provide the framework for structure functional classification of the entire ROK family.
- Published
- 2012
217. Evolution of the Exon-Intron Structure in Ciliate Genomes
- Author
-
Mikhail S. Gelfand and Vladyslav S. Bondarenko
- Subjects
Protein Structure Comparison ,Evolutionary Genetics ,0301 basic medicine ,Paramecium ,Mature messenger RNA ,lcsh:Medicine ,Ciliate Protozoans ,Oxytricha ,Biochemistry ,Exon ,Macromolecular Structure Analysis ,lcsh:Science ,Genome Evolution ,Protozoans ,Genetics ,Genome ,Multidisciplinary ,biology ,Genomics ,Exons ,Group II intron ,Stylonychia ,RNA splicing ,Codon, Terminator ,Sequence Analysis ,Research Article ,Protein Structure ,RNA Splicing ,Genome Complexity ,Research and Analysis Methods ,Molecular Evolution ,Evolution, Molecular ,03 medical and health sciences ,Group I catalytic intron ,RNA, Messenger ,Ciliophora ,Molecular Biology Techniques ,Sequencing Techniques ,Molecular Biology ,Evolutionary Biology ,030102 biochemistry & molecular biology ,lcsh:R ,Organisms ,Intron ,Biology and Life Sciences ,Computational Biology ,Proteins ,biology.organism_classification ,Introns ,Organismal Evolution ,Alternative Splicing ,030104 developmental biology ,Tetrahymena ,lcsh:Q ,Sequence Alignment - Abstract
A typical eukaryotic gene is comprised of alternating stretches of regions, exons and introns, retained in and spliced out a mature mRNA, respectively. Although the length of introns may vary substantially among organisms, a large fraction of genes contains short introns in many species. Notably, some Ciliates (Paramecium and Nyctotherus) possess only ultra-short introns, around 25 bp long. In Paramecium, ultra-short introns with length divisible by three (3n) are under strong evolutionary pressure and have a high frequency of in-frame stop codons, which, in the case of intron retention, cause premature termination of mRNA translation and consequent degradation of the mis-spliced mRNA by the nonsense-mediated decay mechanism. Here, we analyzed introns in five genera of Ciliates, Paramecium, Tetrahymena, Ichthyophthirius, Oxytricha, and Stylonychia. Introns can be classified into two length classes in Tetrahymena and Ichthyophthirius (with means 48 bp, 69 bp, and 55 bp, 64 bp, respectively), but, surprisingly, comprise three distinct length classes in Oxytricha and Stylonychia (with means 33-35 bp, 47-51 bp, and 78-80 bp). In most ranges of the intron lengths, 3n introns are underrepresented and have a high frequency of in-frame stop codons in all studied species. Introns of Paramecium, Tetrahymena, and Ichthyophthirius are preferentially located at the 5' and 3' ends of genes, whereas introns of Oxytricha and Stylonychia are strongly skewed towards the 5' end. Analysis of evolutionary conservation shows that, in each studied genome, a significant fraction of intron positions is conserved between the orthologs, but intron lengths are not correlated between the species. In summary, our study provides a detailed characterization of introns in several genera of Ciliates and highlights some of their distinctive properties, which, together, indicate that splicing spellchecking is a universal and evolutionarily conserved process in the biogenesis of short introns in various representatives of Ciliates.
- Published
- 2016
218. History of chromosome rearrangements reflects the spatial organization of yeast chromosomes
- Author
-
Mikhail S. Gelfand, Geoffrey Fudenberg, Ekaterina Khrameeva, Leonid A. Mirny, Massachusetts Institute of Technology. Department of Physics, Fudenberg, Geoffrey, and Mirny, Leonid A
- Subjects
Gene Rearrangement ,0301 basic medicine ,Genetics ,biology ,DNA repair ,Saccharomyces cerevisiae ,Chromosome ,biology.organism_classification ,Biochemistry ,Genome ,Article ,Computer Science Applications ,Chromatin ,Telomere ,03 medical and health sciences ,chemistry.chemical_compound ,030104 developmental biology ,chemistry ,Yeasts ,Centromere ,Chromosomes, Fungal ,Molecular Biology ,Algorithms ,DNA - Abstract
Three-dimensional (3D) organization of genomes affects critical cellular processes such as transcription, replication, and deoxyribo nucleic acid (DNA) repair. While previous studies have investigated the natural role, the 3D organization plays in limiting a possible set of genomic rearrangements following DNA repair, the influence of specific organizational principles on this process, particularly over longer evolutionary time scales, remains relatively unexplored. In budding yeast S.cerevisiae, chromosomes are organized into a Rabl-like configuration, with clustered centromeres and telomeres tethered to the nuclear periphery. Hi-C data for S.cerevisiae show that a consequence of this Rabl-like organization is that regions equally distant from centromeres are more frequently in contact with each other, between arms of both the same and different chromosomes. Here, we detect rearrangement events in Saccharomyces species using an automatic approach, and observe increased rearrangement frequency between regions with higher contact frequencies. Together, our results underscore how specific principles of 3D chromosomal organization can influence evolutionary events., National Institutes of Health (U.S.) (Grant GM114190)
- Published
- 2016
219. Introduction to selected papers from MCCMB 2015
- Author
-
Mikhail S. Gelfand
- Subjects
0301 basic medicine ,03 medical and health sciences ,030104 developmental biology ,Biology ,Molecular Biology ,Biochemistry ,Computer Science Applications - Published
- 2016
220. Sequencing Potential of Nested Strand Hybridization
- Author
-
Oleg I. Razgulyaev, Anatoly R. Rubinov, Mikhail S. Gelfand, and Alexander Chetverin
- Subjects
Genetics ,Base Sequence ,Oligonucleotide ,Nucleic Acid Hybridization ,DNA ,Models, Theoretical ,Biology ,Computational Mathematics ,chemistry.chemical_compound ,Oligodeoxyribonucleotides ,Computational Theory and Mathematics ,Sequencing by hybridization ,chemistry ,Modeling and Simulation ,Nucleic Acid Conformation ,Oligonucleotide Probes ,Molecular Biology ,Mathematics - Abstract
The capability of the sequencing by nested strand hybridization (SNSH) method to sequence unseparated pools of DNA fragments was assessed in computer simulation experiments. The results demonstrate the high resolving power of the method and its tolerance to false-positive errors. We determine the optimal proportion between the fragment length and the pool size at a given length of oligonucleotide probes, compare SNSH to the standard SBH, and suggest the best experimental setting for the special case of sequencing of long isolated fragments.
- Published
- 1995
221. Changes in snoRNA and snRNA abundance in the human, chimpanzee, macaque and mouse brain
- Author
-
Philipp Khaitovich, Mikhail S. Gelfand, Yuriy D. Korostelev, Zheng Yan, Dingding Han, Bin Zhang, Ning-Yi Shao, Yi-Ping Phoebe Chen, Boris M. Velichkovsky, and Ekaterina Khrameeva
- Subjects
0301 basic medicine ,Pan troglodytes ,brain ,In situ hybridization ,Biology ,snoRNA ,Macaque ,Protein Structure, Secondary ,Evolution, Molecular ,Mice ,03 medical and health sciences ,snRNA ,Phylogenetics ,biology.animal ,evolution ,Genetics ,Animals ,Humans ,RNA, Small Nucleolar ,snRNP ,human ,Small nucleolar RNA ,Protein secondary structure ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Uncategorized ,Regulation of gene expression ,urogenital system ,Genetic Variation ,030104 developmental biology ,Gene Expression Regulation ,Mutation ,Macaca ,Small nuclear RNA ,Research Article - Abstract
Small nuclear and nucleolar RNAs (snRNAs and snoRNAs) are known to be functionally and evolutionarily conserved elements of transcript processing machinery. Here, we investigated the expression evolution of snRNAs and snoRNAs by measuring their abundance in the frontal cortex of humans, chimpanzees, rhesus monkeys, and mice. Although snRNA expression is largely conserved, 44% of the 185 measured snoRNA and 40% of the 134 snoRNA families showed significant expression divergence among species. The snRNA and snoRNA expression divergence included drastic changes unique to humans: A 10-fold elevated expression of U1 snRNA and a 1,000-fold drop in expression of SNORA29. The decreased expression of SNORA29 might be due to two mutations that affect secondary structure stability. Using in situ hybridization, we further localized SNORA29 expression to nucleolar regions of neuronal cells. Our study presents the first observation of snoRNA abundance changes specific to the human lineage and suggests a possible mechanism underlying these changes.
- Published
- 2016
222. RegTransBase--a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes
- Author
-
Michael J. Cipriano, Dmitry A. Rodionov, Alexey E. Kazakov, Mikhail S. Gelfand, Pavel N Novichkov, Inna Dubchak, and Adam P. Arkin
- Subjects
Biology ,Proteomics ,computer.software_genre ,Regulon ,Database ,User-Computer Interface ,Transcriptional regulation ,Genetics ,Regulatory Elements, Transcriptional ,Prokaryotes ,Comparative genomics ,Regulation of gene expression ,Internet ,DNA binding site ,Gene Expression Regulation ,Prokaryotic Cells ,Regulatory sequence ,DNA microarray ,Databases, Nucleic Acid ,computer ,Genome, Bacterial ,Biotechnology ,Transcription Factors - Abstract
Background Due to the constantly growing number of sequenced microbial genomes, comparative genomics has been playing a major role in the investigation of regulatory interactions in bacteria. Regulon inference mostly remains a field of semi-manual examination since absence of a knowledgebase and informatics platform for automated and systematic investigation restricts opportunities for computational prediction. Additionally, confirming computationally inferred regulons by experimental data is critically important. Description RegTransBase is an open-access platform with a user-friendly web interface publicly available at http://regtransbase.lbl.gov. It consists of two databases – a manually collected hierarchical regulatory interactions database based on more than 7000 scientific papers which can serve as a knowledgebase for verification of predictions, and a large set of curated by experts transcription factor binding sites used in regulon inference by a variety of tools. RegTransBase captures the knowledge from published scientific literature using controlled vocabularies and contains various types of experimental data, such as: the activation or repression of transcription by an identified direct regulator; determination of the transcriptional regulatory function of a protein (or RNA) directly binding to DNA or RNA; mapping of binding sites for a regulatory protein; characterization of regulatory mutations. Analysis of the data collected from literature resulted in the creation of Putative Regulons from Experimental Data that are also available in RegTransBase. Conclusions RegTransBase is a powerful user-friendly platform for the investigation of regulation in prokaryotes. It uses a collection of validated regulatory sequences that can be easily extracted and used to infer regulatory interactions by comparative genomics techniques thus assisting researchers in the interpretation of transcriptional regulation data.
- Published
- 2012
223. Biases in read coverage demonstrated by interlaboratory and interplatform comparison of 117 mRNA and genome sequencing experiments
- Author
-
Mikhail S. Gelfand and Ekaterina Khrameeva
- Subjects
Cancer genome sequencing ,Genetics ,Massive parallel sequencing ,Shotgun sequencing ,Genome, Human ,Sequence Analysis, RNA ,Applied Mathematics ,Sequence assembly ,High-Throughput Nucleotide Sequencing ,Hybrid genome assembly ,Computational biology ,Sequence Analysis, DNA ,Biology ,Biochemistry ,Deep sequencing ,Computer Science Applications ,Massively parallel signature sequencing ,MRNA Sequencing ,Proceedings ,Structural Biology ,Humans ,RNA, Messenger ,Transcriptome ,Molecular Biology - Abstract
High-throughput sequencing of whole genomes and transcriptomes allows one to generate large amounts of sequence data very rapidly and at a low cost. The goal of most mRNA sequencing studies is to perform the comparison of the expression level between different samples. However, given a broad variety of modern sequencing protocols, platforms and versions thereof, it is not clear to what extent the obtained results are consistent across platforms and laboratories. The comparison of 117 human mRNA and genome high-throughput sequencing experiments performed on the Illumina and SOLiD platforms at 26 institutions all over the world demonstrated high dependency of the gene coverage profiles on the producing laboratory. Gene coverage profiles showed laboratory-specific non-uniformity that survived the 3'-bias correction and mappability normalization, suggesting that there are other yet unknown mRNA-associated biases.
- Published
- 2012
224. Glutamine versus ammonia utilization in the NAD synthetase family
- Author
-
Jessica De Ingeniis, Konstantin Shatalin, Leonardo Sorci, Andrei L. Osterman, Mikhail S. Gelfand, and Marat D. Kazanov
- Subjects
Methanocaldococcus ,Evolutionary Processes ,Glutamine ,lcsh:Medicine ,Biochemistry ,Cofactor ,03 medical and health sciences ,Amide Synthases ,Ammonia ,Genome Analysis Tools ,Glutamine synthetase ,Macromolecular Structure Analysis ,lcsh:Science ,Biology ,Phylogeny ,030304 developmental biology ,Enzyme Kinetics ,0303 health sciences ,Evolutionary Biology ,Multidisciplinary ,biology ,Glutaminase ,Reverse Transcriptase Polymerase Chain Reaction ,Enzyme Classes ,Thermus thermophilus ,030302 biochemistry & molecular biology ,lcsh:R ,Computational Biology ,Genomics ,Comparative Genomics ,biology.organism_classification ,Enzymes ,biology.protein ,lcsh:Q ,NAD+ kinase ,Sequence Analysis ,Archaea ,Research Article - Abstract
NAD is a ubiquitous and essential metabolic redox cofactor which also functions as a substrate in certain regulatory pathways. The last step of NAD synthesis is the ATP-dependent amidation of deamido-NAD by NAD synthetase (NADS). Members of the NADS family are present in nearly all species across the three kingdoms of Life. In eukaryotic NADS, the core synthetase domain is fused with a nitrilase-like glutaminase domain supplying ammonia for the reaction. This two-domain NADS arrangement enabling the utilization of glutamine as nitrogen donor is also present in various bacterial lineages. However, many other bacterial members of NADS family do not contain a glutaminase domain, and they can utilize only ammonia (but not glutamine) in vitro. A single-domain NADS is also characteristic for nearly all Archaea, and its dependence on ammonia was demonstrated here for the representative enzyme from Methanocaldococcus jannaschi. However, a question about the actual in vivo nitrogen donor for single-domain members of the NADS family remained open: Is it glutamine hydrolyzed by a committed (but yet unknown) glutaminase subunit, as in most ATP-dependent amidotransferases, or free ammonia as in glutamine synthetase? Here we addressed this dilemma by combining evolutionary analysis of the NADS family with experimental characterization of two representative bacterial systems: a two-subunit NADS from Thermus thermophilus and a single-domain NADS from Salmonella typhimurium providing evidence that ammonia (and not glutamine) is the physiological substrate of a typical single-domain NADS. The latter represents the most likely ancestral form of NADS. The ability to utilize glutamine appears to have evolved via recruitment of a glutaminase subunit followed by domain fusion in an early branch of Bacteria. Further evolution of the NADS family included lineage-specific loss of one of the two alternative forms and horizontal gene transfer events. Lastly, we identified NADS structural elements associated with glutamine-utilizing capabilities.
- Published
- 2012
225. Evolution of regulatory motifs of bacterial transcription factors
- Author
-
Vassily A. Lyubetsky, Mikhail S. Gelfand, Dmitry A. Rodionov, Olga N. Laikova, and Konstantin Yu. Gorbunov
- Subjects
Computational biology ,Lac repressor ,Biology ,Evolution, Molecular ,Bacterial Proteins ,Bacterial transcription ,Models of DNA evolution ,Consensus Sequence ,Genetics ,Regulatory Elements, Transcriptional ,Molecular Biology ,Transcription factor ,Gene ,Phylogeny ,Binding Sites ,Phylogenetic tree ,Bacteria ,Base Sequence ,Models, Genetic ,Promoter ,General Medicine ,DNA binding site ,Computational Mathematics ,Computational Theory and Mathematics ,Algorithms ,Transcription Factors - Abstract
Unlike evolution of genes and proteins, evolution of regulatory systems is a relatively new area of research. In particular, little systematic study has been done on evolution of DNA binding motifs in transcription factor families. We suggest an algorithm that reconstructs the most parsimonious scenario for changes in DNA binding motifs along an evolutionary tree of transcription factor binding sites. The algorithm was validated on several artificial datasets and then applied to reconstruct the evolutionary history of the NrdR, MntR, LacI, FNR, Irr, Fur and Rrf2 transcription factor families. The algorithm seems to be sufficiently robust to be applicable in realistic situations. In most transcription factor families the changes in binding motifs are limited to several branches. Changes in consensus nucleotides proceed via an intermediate stage when the respective position is not conserved.
- Published
- 2012
226. Comparative genomics of CytR, an unusual member of the LacI family of transcription factors
- Author
-
Mikhail S. Gelfand and Natalia V. Sernova
- Subjects
Operator Regions, Genetic ,lcsh:Medicine ,Genome ,Biochemistry ,Conserved sequence ,Molecular cell biology ,Lac Repressors ,Direct repeat ,lcsh:Science ,Phylogeny ,Genetics ,Multidisciplinary ,Escherichia coli Proteins ,Genomics ,Phylogenetics ,Research Article ,Protein Binding ,Enterobacteriales ,Protein Structure ,DNA transcription ,Molecular Sequence Data ,Sequence alignment ,Biology ,Microbiology ,Evolution, Molecular ,Escherichia coli ,Evolutionary Systematics ,Comparative genomics ,Evolutionary Biology ,Bacterial Evolution ,Binding Sites ,Base Sequence ,Bacterial Taxonomy ,lcsh:R ,Proteins ,Computational Biology ,Bacteriology ,Gene Expression Regulation, Bacterial ,Comparative Genomics ,biology.organism_classification ,Repressor Proteins ,Regulon ,bacteria ,lcsh:Q ,Gene expression ,Sequence Alignment ,Transcription Factors - Abstract
CytR is a transcription regulator from the LacI family, present in some gamma-proteobacteria including Escherichia coli and known not only for its cellular role, control of transport and utilization of nucleosides, but for a number of unusual structural properties. The present study addressed three related problems: structure of CytR-binding sites and motifs, their evolutionary conservation, and identification of new members of the CytR regulon. While the majority of CytR-binding sites are imperfect inverted repeats situated between binding sites for another transcription factor, CRP, other architectures were observed, in particular, direct repeats. While the similarity between sites for different genes in one genome is rather low, and hence the consensus motif is weak, there is high conservation of orthologous sites in different genomes (mainly in the Enterobacteriales) arguing for the presence of specific CytR-DNA contacts. On larger evolutionary distances candidate CytR sites may migrate but the approximate distance between flanking CRP sites tends to be conserved, which demonstrates that the overall structure of the CRP-CytR-DNA complex is gene-specific. The analysis yielded candidate CytR-binding sites for orthologs of known regulon members in less studied genomes of the Enterobacteriales and Vibrionales and identified a new candidate member of the CytR regulon, encoding a transporter named NupT (YcdZ).
- Published
- 2012
227. Evidence for Widespread Association of Mammalian Splicing and Conserved Long-Range RNA Structures
- Author
-
Mikhail S. Gelfand, Petr M. Rubtsov, Dmitri D. Pervouchine, Andrei V. Mironov, Oleksii Nikolaienko, Marina Yu. Pichugina, and Ekaterina Khrameeva
- Subjects
Genetics ,Splicing factor ,Exon ,RNA editing ,RNA splicing ,Alternative splicing ,Exonic splicing enhancer ,Intron ,Computational biology ,Biology ,Post-transcriptional modification - Abstract
Pre-mRNA structure impacts many cellular processes, including splicing in genes associated with disease. The contemporary paradigm of RNA structure prediction is biased toward secondary structures that occur within short ranges of pre-mRNA, although long-range base-pairings are known to be at least as important. Recently, we developed an efficient method for detecting conserved RNA structures on the genome-wide scale, one that does not require multiple sequence alignments and works equally well for the detection of local and long-range base-pairings. Using an enhanced method that detects base-pairings at all possible combinations of splice sites within each gene, we report a list of RNA structures that could be involved in the regulation of splicing in mammals. We demonstrate statistically that there is a strong association between the occurrence of conserved RNA structures and alternative splicing, where local RNA structures are generally more frequent at alternative donor splice sites, while long-range structures are more associated with weak alternative acceptor splice sites. A fraction of the reported structures is associated with unannotated splicing events that are confirmed by RNA-seq data. As an example, we validated the RNA structure in the human SF1 gene using mini-genes in the HEK293 cell line. Point mutations that disrupted the base-pairing of two complementary boxes between exons 9 and 10 of this gene altered the splicing pattern, while the compensatory mutations that reestablished the base-pairing reverted splicing to that of the wild-type. There is statistical evidence for a Dscam-like class of mammalian genes, in which mutually exclusive RNA structures control mutually exclusive alternative splicing. In sum, we propose that long-range base-pairings carry an important, yet unconsidered part of the splicing code, and that, even by modest estimates, there must be thousands of such potentially regulatory structures conserved throughout the evolutionary history of mammals.
- Published
- 2012
228. Evidence for widespread association of mammalian splicing and conserved long-range RNA structures
- Author
-
Mikhail S. Gelfand, Ekaterina Khrameeva, Andrei V. Mironov, Oleksii Nikolaienko, Marina Yu. Pichugina, Dmitri D. Pervouchine, and Petr M. Rubtsov
- Subjects
Bioinformatics ,RNA Splicing ,Molecular Sequence Data ,Exonic splicing enhancer ,Computational biology ,Biology ,Protein Serine-Threonine Kinases ,Exon ,Splicing factor ,Doublecortin-Like Kinases ,RNA Precursors ,Animals ,Humans ,Molecular Biology ,Conserved Sequence ,Genetics ,Base Sequence ,Sequence Analysis, RNA ,Alternative splicing ,Intron ,Intracellular Signaling Peptides and Proteins ,Alternative Splicing ,HEK293 Cells ,Regulatory sequence ,RNA splicing ,Nucleic Acid Conformation ,RNA Splice Sites ,Minigene - Abstract
Pre-mRNA structure impacts many cellular processes, including splicing in genes associated with disease. The contemporary paradigm of RNA structure prediction is biased toward secondary structures that occur within short ranges of pre-mRNA, although long-range base-pairings are known to be at least as important. Recently, we developed an efficient method for detecting conserved RNA structures on the genome-wide scale, one that does not require multiple sequence alignments and works equally well for the detection of local and long-range base-pairings. Using an enhanced method that detects base-pairings at all possible combinations of splice sites within each gene, we now report RNA structures that could be involved in the regulation of splicing in mammals. Statistically, we demonstrate strong association between the occurrence of conserved RNA structures and alternative splicing, where local RNA structures are generally more frequent at alternative donor splice sites, while long-range structures are more associated with weak alternative acceptor splice sites. As an example, we validated the RNA structure in the human SF1 gene using minigenes in the HEK293 cell line. Point mutations that disrupted the base-pairing of two complementary boxes between exons 9 and 10 of this gene altered the splicing pattern, while the compensatory mutations that reestablished the base-pairing reverted splicing to that of the wild-type. There is statistical evidence for a Dscam-like class of mammalian genes, in which mutually exclusive RNA structures control mutually exclusive alternative splicing. In sum, we propose that long-range base-pairings carry an important, yet unconsidered part of the splicing code, and that, even by modest estimates, there must be thousands of such potentially regulatory structures conserved throughout the evolutionary history of mammals.
- Published
- 2011
229. Temporal regulation of gene expression of the Escherichia coli bacteriophage phiEco32
- Author
-
Natalja Akulenko, Mikhail S. Gelfand, Marko Djordjevic, Olga Pavlova, Evgeny Klimuk, Konstantin Severinov, Daria Lavysh, and Dmitry A. Ravcheev
- Subjects
Gene Expression Regulation, Viral ,Transcriptional Activation ,Transcription, Genetic ,Molecular Sequence Data ,Sigma Factor ,Biology ,medicine.disease_cause ,Coliphages ,Article ,Bacteriophage ,03 medical and health sciences ,chemistry.chemical_compound ,Structural Biology ,Transcription (biology) ,Sigma factor ,RNA polymerase ,medicine ,Escherichia coli ,Promoter Regions, Genetic ,Molecular Biology ,Gene ,030304 developmental biology ,Regulation of gene expression ,Genetics ,0303 health sciences ,Base Sequence ,030302 biochemistry & molecular biology ,Computational Biology ,Promoter ,DNA-Directed RNA Polymerases ,biology.organism_classification ,Molecular biology ,chemistry - Abstract
Escherichia coli phage phiEco32 encodes two proteins that bind to host RNA polymerase — gp79, a novel protein, and gp36, a distant homolog of σ70 family proteins. Here, we investigated the temporal pattern of phiEco32 and host gene expression during the infection. Host transcription shut-off and three distinct bacteriophage temporal gene classes – early, middle, and late – were revealed. A combination of bioinformatic and biochemical approaches allowed identification of phage promoters recognized by host RNA polymerase holoenzyme containing the σ70 factor. These promoters are located upstream of early phage genes. A combination of macroarray data, primer extension, and in vitro transcription analyses allowed identification of six promoters recognized by RNA polymerase holoenzyme containing gp36. These promoters are characterized by a single consensus element tAATGTAtA and are located upstream of the middle and late phage genes. Curiously, gp79, an inhibitor of host and early phage transcription by σ70-holoenzyme, activated transcription by the gp36-holoenzyme in vitro.
- Published
- 2011
230. Dynamic programming: one algorithmic key for many biological locks
- Author
-
Pavel Pevzner, Ron Shamir, and Mikhail S. Gelfand
- Subjects
Dynamic programming ,Theoretical computer science ,Computer science ,Distributed computing ,Key (cryptography) - Published
- 2011
231. Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus
- Author
-
James K. Fredrickson, Andrei L. Osterman, Elena D. Stavrovskaya, Dmitry A. Rodionov, Galina Yu Kovaleva, Pavel S. Novichkov, Adam P. Arkin, Dmitry A. Ravcheev, Xiaoqing Li, Alexey E. Kazakov, Mikhail S. Gelfand, Marat D. Kazanov, Elizabeth A. Permina, Irina A. Rodionova, Ross Overbeek, Anna Gerasimova, Margaret F. Romine, Inna Dubchak, and Olga N. Laikova
- Subjects
Shewanella ,Gene regulatory network ,Genome ,Medical and Health Sciences ,Multidisciplinaire, généralités & autres [F99] [Sciences du vivant] ,2.2 Factors relating to the physical environment ,Gene Regulatory Networks ,Amino Acids ,Aetiology ,Genetics ,Regulation of gene expression ,0303 health sciences ,biology ,Escherichia coli Proteins ,Fatty Acids ,Bacterial ,Genomics ,Biological Sciences ,DNA-Binding Proteins ,Infectious Diseases ,Multigene Family ,Carbohydrate Metabolism ,Infection ,Biotechnology ,lcsh:QH426-470 ,Bioinformatics ,lcsh:Biotechnology ,Multidisciplinary, general & others [F99] [Life sciences] ,Regulon ,Acetylglucosamine ,03 medical and health sciences ,Bacterial Proteins ,Information and Computing Sciences ,lcsh:TP248.13-248.65 ,Escherichia coli ,030304 developmental biology ,Comparative genomics ,Binding Sites ,030306 microbiology ,Research ,Human Genome ,Gene Expression Regulation, Bacterial ,biology.organism_classification ,DNA binding site ,Repressor Proteins ,lcsh:Genetics ,Emerging Infectious Diseases ,Gene Expression Regulation ,Riboswitch ,bacteria ,Genome, Bacterial ,Transcription Factors - Abstract
Background Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. Results To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. Multiple variations in regulatory strategies between the Shewanella spp. and E. coli include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp). Conclusions We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 Shewanella genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in S. oneidensis MR-1. Analysis of correlations in gene expression patterns helps to interpret the reconstructed regulatory network. The inferred regulatory interactions will provide an additional regulatory constrains for an integrated model of metabolism and regulation in S. oneidensis MR-1.
- Published
- 2011
232. Comparative genomic analysis of the hexuronate metabolism genes and their regulation in gammaproteobacteria
- Author
-
Mikhail S. Gelfand, Olga N. Ozoline, Dmitry A. Ravcheev, Inna A. Suvorova, Dmitry A. Rodionov, and Maria N. Tutukina
- Subjects
Subfamily ,Molecular Sequence Data ,Multidisciplinary, general & others [F99] [Life sciences] ,Biology ,Microbiology ,Regulon ,Multidisciplinaire, généralités & autres [F99] [Sciences du vivant] ,Bacterial Proteins ,Gammaproteobacteria ,Hexuronate ,Molecular Biology ,Transcription factor ,Gene ,Phylogeny ,Genetics ,Comparative genomics ,Regulation of gene expression ,Base Sequence ,Hexuronic Acids ,Computational Biology ,Gene Expression Regulation, Bacterial ,Genomics ,biology.organism_classification ,Biosynthetic Pathways ,bacteria ,Transcription Factors - Abstract
The hexuronate metabolism in Escherichia coli is regulated by two related transcription factors from the FadR subfamily of the GntR family, UxuR and ExuR. UxuR controls the d -glucuronate metabolism, while ExuR represses genes involved in the metabolism of all hexuronates. We use a comparative genomics approach to reconstruct the hexuronate metabolic pathways and transcriptional regulons in gammaproteobacteria. We demonstrate differences in the binding motifs of UxuR and ExuR, identify new candidate members of the UxuR/ExuR regulons, and describe the links between the UxuR/ExuR regulons and the adjacent regulons UidR, KdgR, and YjjM. We provide experimental evidence that two predicted members of the UxuR regulon, yjjM and yjjN , are the subject of complex regulation by this transcription factor in E. coli .
- Published
- 2011
233. Prediction of the exon-intron structure by a dynamic programming approach
- Author
-
Mikhail A. Roytberg and Mikhail S. Gelfand
- Subjects
Statistics and Probability ,Models, Genetic ,Applied Mathematics ,Intron ,Structure (category theory) ,Proteins ,DNA ,Exons ,General Medicine ,Computational biology ,Exon intron ,Introns ,General Biochemistry, Genetics and Molecular Biology ,Dynamic programming ,Exon ,Modeling and Simulation ,Algorithm ,Algorithms ,Mathematics ,Software - Published
- 1993
234. Large-scale identification and analysis of C-proteins
- Author
-
Valery, Sorokin, Konstantin, Severinov, and Mikhail S, Gelfand
- Subjects
Binding Sites ,Base Sequence ,Transcription, Genetic ,Computational Biology ,Reproducibility of Results ,Protein Multimerization ,Regulatory Sequences, Nucleic Acid ,Protein Structure, Quaternary ,Transcription Factors - Abstract
The restriction-modification system is a toxin-antitoxin mechanism of bacterial cells to resist phage attacks. High efficiency comes at a price of high maintenance costs: (1) a host cell dies whenever it loses restriction-modification genes and (2) whenever a plasmid with restriction-modification genes enters a naïve cell, modification enzyme (methylase) has to be expressed prior to the synthesis of the restriction enzyme (restrictase) or the cell dies. These phenomena imply a sophisticated regulatory mechanism. During the evolution several such mechanisms were developed, of which one relies on a special C(control)-protein, a short autoregulatory protein containing an HTH-domain. Given the extreme diversity among restriction-modification systems, one could expect that C-proteins had evolved into several groups that might differ in autoregulatory binding sites architecture. However, only a few C-proteins (and the corresponding binding sites) were known before this study. Bioinformatics studies applied to C-proteins and their binding sites were limited to groups of well-known C-proteins and lacked systematic analysis. In this work, the authors use bioinformatics techniques to discover 201 C-protein genes with predicted autoregulatory binding sites. The systematic analysis of the predicted sites allowed for the discovery of 10 structural classes of binding sites.
- Published
- 2010
235. An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies
- Author
-
Mikhail S. Gelfand, Olga V. Kalinina, Aleksandra B. Rakhmaninova, Pavel V. Mazin, Robert B. Russell, Andrey A. Mironov, and Anatoly R. Rubinov
- Subjects
lcsh:QH426-470 ,Phylogenetic tree ,Computer science ,Research ,Applied Mathematics ,Computational biology ,Ligand (biochemistry) ,computer.software_genre ,lcsh:Genetics ,Annotation ,lcsh:Biology (General) ,Computational Theory and Mathematics ,Structural Biology ,Identification (biology) ,Data mining ,lcsh:QH301-705.5 ,Molecular Biology ,computer - Abstract
Background Recent progress in sequencing and 3 D structure determination techniques stimulated development of approaches aimed at more precise annotation of proteins, that is, prediction of exact specificity to a ligand or, more broadly, to a binding partner of any kind. Results We present a method, SDPclust, for identification of protein functional subfamilies coupled with prediction of specificity-determining positions (SDPs). SDPclust predicts specificity in a phylogeny-independent stochastic manner, which allows for the correct identification of the specificity for proteins that are separated on a phylogenetic tree, but still bind the same ligand. SDPclust is implemented as a Web-server http://bioinf.fbb.msu.ru/SDPfoxWeb/ and a stand-alone Java application available from the website. Conclusions SDPclust performs a simultaneous identification of specificity determinants and specificity groups in a statistically robust and phylogeny-independent manner.
- Published
- 2010
236. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach
- Author
-
Mikhail S. Gelfand, Elena D. Stavrovskaya, Dmitry A. Rodionov, Inna Dubchak, Andrey A. Mironov, Alexey E. Kazakov, Pavel S. Novichkov, Adam P. Arkin, and Elena S. Novichkova
- Subjects
Web server ,MicrobesOnline ,Operon ,Genomics ,Computational biology ,Biology ,computer.software_genre ,Genome ,Regulon ,Staphylococcaceae ,03 medical and health sciences ,User-Computer Interface ,Genetics ,030304 developmental biology ,Comparative genomics ,0303 health sciences ,Internet ,030306 microbiology ,Articles ,Systems Integration ,bacteria ,User interface ,computer ,Genome, Bacterial ,Software - Abstract
RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.
- Published
- 2010
237. RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes
- Author
-
Inna Dubchak, Olga N. Laikova, Elena S. Novichkova, Mikhail S. Gelfand, Pavel S. Novichkov, Adam P. Arkin, and Dmitry A. Rodionov
- Subjects
Operon ,Amino Acid Motifs ,Information Storage and Retrieval ,Sequence alignment ,Biology ,computer.software_genre ,Genome ,03 medical and health sciences ,Databases ,Genetic ,Information and Computing Sciences ,Databases, Genetic ,Genetics ,Taxonomic rank ,Databases, Protein ,Gene ,Transcription factor ,030304 developmental biology ,0303 health sciences ,Internet ,Binding Sites ,Database ,Nucleic Acid ,030306 microbiology ,Protein ,Bacterial ,Computational Biology ,Articles ,DNA ,Biological Sciences ,DNA binding site ,Regulon ,bacteria ,Databases, Nucleic Acid ,computer ,Sequence Alignment ,Genome, Bacterial ,Algorithms ,Software ,Environmental Sciences ,Transcription Factors ,Developmental Biology - Abstract
The RegPrecise database (http://regprecise.lbl.gov) was developed for capturing, visualization and analysis of predicted transcription factor regulons in prokaryotes that were reconstructed and manually curated by utilizing the comparative genomic approach. A significant number of high-quality inferences of transcriptional regulatory interactions have been already accumulated for diverse taxonomic groups of bacteria. The reconstructed regulons include transcription factors, their cognate DNA motifs and regulated genes/operons linked to the candidate transcription factor binding sites. The RegPrecise allows for browsing the regulon collections for: (i) conservation of DNA binding sites and regulated genes for a particular regulon across diverse taxonomic lineages; (ii) sets of regulons for a family of transcription factors; (iii) repertoire of regulons in a particular taxonomic group of species; (iv) regulons associated with a metabolic pathway or a biological process in various genomes. The initial release of the database includes approximately 11,500 candidate binding sites for approximately 400 orthologous groups of transcription factors from over 350 prokaryotic genomes. Majority of these data are represented by genome-wide regulon reconstructions in Shewanella and Streptococcus genera and a large-scale prediction of regulons for the LacI family of transcription factors. Another section in the database represents the results of accurate regulon propagation to the closely related genomes.
- Published
- 2010
238. Large-Scale Identification and Analysis of C-Proteins
- Author
-
Valery Sorokin, Konstantin Severinov, and Mikhail S. Gelfand
- Subjects
chemistry.chemical_classification ,Restriction enzyme ,Plasmid ,Enzyme ,Protein structure ,chemistry ,Regulatory sequence ,Transcription (biology) ,Computational biology ,Binding site ,Biology ,Gene - Abstract
The restriction-modification system is a toxin-antitoxin mechanism of bacterial cells to resist phage attacks. High efficiency comes at a price of high maintenance costs: (1) a host cell dies whenever it loses restriction-modification genes and (2) whenever a plasmid with restriction-modification genes enters a naive cell, modification enzyme (methylase) has to be expressed prior to the synthesis of the restriction enzyme (restrictase) or the cell dies. These phenomena imply a sophisticated regulatory mechanism. During the evolution several such mechanisms were developed, of which one relies on a special C(control)-protein, a short autoregulatory protein containing an HTH-domain. Given the extreme diversity among restriction-modification systems, one could expect that C-proteins had evolved into several groups that might differ in autoregulatory binding sites architecture. However, only a few C-proteins (and the corresponding binding sites) were known before this study. Bioinformatics studies applied to C-proteins and their binding sites were limited to groups of well-known C-proteins and lacked systematic analysis. In this work, the authors use bioinformatics techniques to discover 201 C-protein genes with predicted autoregulatory binding sites. The systematic analysis of the predicted sites allowed for the discovery of 10 structural classes of binding sites.
- Published
- 2010
239. Positive Selection and Alternative Splicing in Humans
- Author
-
Mikhail S. Gelfand and Vasily Ramensky
- Subjects
Genetics ,Negative selection ,Genome evolution ,Molecular evolution ,Alternative splicing ,Human genome ,Biology ,McDonald–Kreitman test ,Gene ,Selection (genetic algorithm) - Abstract
Alternative splicing is an important mechanism of generating protein diversity and accelerated genome evolution. The mode of the selection acting in constitutive, major alternative and minor alternative regions of human genes is different. Whereas constitutive and major alternative regions tend to evolve under negative (stabilizing) selection, alternatively spliced exons from minor isoforms experience lower selective pressure at the amino acid level accompanied by weak selection against synonymous sequence variation. The McDonald–Kreitman test uses the nucleotide variation for a gene or a set of genes between and within species to detect the positive Darwinian selection in the presence of negative selection. The results of the test suggest that alternatively spliced exons are also subject to positive selection, with up to 27% of amino acids fixed by positive selection. Key concepts Alternative splicing is an important mechanism of generating protein diversity and accelerated genome evolution. Alternatively spliced regions are often evolutionarily young. There is a difference in the selection mode in constitutive, major alternative, and minor alternative regions of human genes. Constitutive and major alternative regions evolve under negative (stabilizing) selection. Up to 27% of positions in minor alternative regions may be experiencing positive selection. Keywords: alternative splicing; positive selection; human genome; molecular evolution; McDonald–Kreitman test
- Published
- 2009
240. Rodent-specific alternative exons are more frequent in rapidly evolving genes and in paralogs
- Author
-
Andrey A. Mironov, Mikhail S. Gelfand, and Ramil N. Nurtdinov
- Subjects
Genetics ,Evolution ,Alternative splicing ,Sequence alignment ,Exons ,Biology ,Genome ,Rats ,Conserved sequence ,Evolution, Molecular ,Alternative Splicing ,Mice ,Exon ,Dogs ,QH359-425 ,Animals ,Humans ,Gene family ,Sequence Alignment ,Gene ,Conserved Sequence ,Ecology, Evolution, Behavior and Systematics ,Research Article ,Sequence (medicine) - Abstract
Background Alternative splicing is an important mechanism for generating functional and evolutionary diversity of proteins in eukaryotes. Here, we studied the frequency and functionality of recently gained, rodent-specific alternative exons. Results We projected the data about alternative splicing of mouse genes to the rat, human, and dog genomes, and identified exons conserved in the rat genome, but missing in more distant genomes. We estimated the frequency of rodent-specific exons while controlling for possible residual conservation of spurious exons. The frequency of rodent-specific exons is higher among predominantly skipped exons and exons disrupting the reading frame. Separation of all genes by the rate of sequence evolution and by gene families has demonstrated that rodent-specific cassette exons are more frequent in rapidly evolving genes and in rodent-specific paralogs. Conclusion Thus we demonstrated that recently gained exons tend to occur in fast-evolving genes, and their inclusion rate tends to be lower than that of older exons. This agrees with the theory that gain of alternative exons is one of the major mechanisms of gene evolution.
- Published
- 2009
241. Engineering transcription factors with novel DNA-binding specificity using comparative genomics
- Author
-
Mikhail S. Gelfand, Tasha A. Desai, Eric J. Alm, Dmitry A. Rodionov, Christopher V. Rao, Massachusetts Institute of Technology. Department of Biological Engineering, Massachusetts Institute of Technology. Department of Civil and Environmental Engineering, and Alm, Eric J.
- Subjects
Iron-Sulfur Proteins ,Operator (biology) ,Cyclic AMP Receptor Protein ,Operator Regions, Genetic ,Transcription, Genetic ,Biology ,Protein Engineering ,03 medical and health sciences ,chemistry.chemical_compound ,0302 clinical medicine ,Transcription (biology) ,Bacterial transcription ,Genes, Reporter ,RNA polymerase ,Genetics ,Transcription factor ,Gene ,030304 developmental biology ,0303 health sciences ,General transcription factor ,Escherichia coli Proteins ,Computational Biology ,Genomics ,3. Good health ,DNA-Binding Proteins ,cAMP receptor protein ,chemistry ,Amino Acid Substitution ,Mutation ,biology.protein ,030217 neurology & neurosurgery ,Transcription Factors - Abstract
The transcriptional program for a gene consists of the promoter necessary for recruiting RNA polymerase along with neighboring operator sites that bind different activators and repressors. From a synthetic biology perspective, if the DNA-binding specificity of these proteins can be changed, then they can be used to reprogram gene expression in cells. While many experimental methods exist for generating such specificity-altering mutations, few computational approaches are available, particularly in the case of bacterial transcription factors. In a previously published computational study of nitrogen oxide metabolism in bacteria, a small number of amino-acid residues were found to determine the specificity within the CRP (cAMP receptor protein)/FNR (fumarate and nitrate reductase regulatory protein) family of transcription factors. By analyzing how these amino acids vary in different regulators, a simple relationship between the identity of these residues and their target DNA-binding sequence was constructed. In this article, we experimentally tested whether this relationship could be used to engineer novel DNA–protein interactions. Using Escherichia coli CRP as a template, we tested eight designs based on this relationship and found that four worked as predicted. Collectively, these results in this work demonstrate that comparative genomics can inform the design of bacterial transcription factors., National Science Foundation (U.S.) (CAREER award CBET- 0644744), United States. Dept. of Energy, Howard Hughes Medical Institute (55005610), Russian Foundation for Basic Research (08-04-01000), Russian Academy of Sciences
- Published
- 2009
242. Comparative Genomics of Regulation of Fatty Acid and Branched-chain Amino Acid Utilization in Proteobacteria
- Author
-
Dmitry A. Rodionov, Eric J. Alm, Alexey E. Kazakov, Adam P. Arkin, Inna Dubchak, and Mikhail S. Gelfand
- Subjects
Shewanella ,Microbiology ,Regulon ,chemistry.chemical_compound ,Bacterial Proteins ,Leucine ,Gammaproteobacteria ,Proteobacteria ,Escherichia coli ,TetR ,Molecular Biology ,Gene ,Betaproteobacteria ,Phylogeny ,Genetics ,biology ,Base Sequence ,Fatty Acids ,Computational Biology ,Genomics ,biology.organism_classification ,DNA binding site ,Branched-Chain Amino Acd ,Biochemistry ,chemistry ,Pseudomonas aeruginosa ,Energy source ,Amino Acids, Branched-Chain ,Genome, Bacterial ,Fatty Acid - Abstract
Bacteria can use branched-chain amino acids (ILV, i.e., isoleucine, leucine, valine) and fatty acids (FAs) as sole carbon and energy sources converting ILV into acetyl-coenzyme A (CoA), propanoyl-CoA, and propionyl-CoA, respectively. In this work, we used the comparative genomic approach to identify candidate transcriptional factors and DNA motifs that control ILV and FA utilization pathways in proteobacteria. The metabolic regulons were characterized based on the identification and comparison of candidate transcription factor binding sites in groups of phylogenetically related genomes. The reconstructed ILV/FA regulatory network demonstrates considerable variability and involves six transcriptional factors from the MerR, TetR, and GntR families binding to 11 distinct DNA motifs. The ILV degradation genes in gamma- and betaproteobacteria are regulated mainly by a novel regulator from the MerR family (e.g., LiuR in Pseudomonas aeruginosa ) (40 species); in addition, the TetR-type regulator LiuQ was identified in some betaproteobacteria (eight species). Besides the core set of ILV utilization genes, the LiuR regulon in some lineages is expanded to include genes from other metabolic pathways, such as the glyoxylate shunt and glutamate synthase in Shewanella species. The FA degradation genes are controlled by four regulators including FadR in gammaproteobacteria (34 species), PsrA in gamma- and betaproteobacteria (45 species), FadP in betaproteobacteria (14 species), and LiuR orthologs in alphaproteobacteria (22 species). The remarkable variability of the regulatory systems associated with the FA degradation pathway is discussed from functional and evolutionary points of view.
- Published
- 2009
243. A novel class of modular transporters for vitamins in prokaryotes
- Author
-
Aymerick Eudes, Andrei L. Osterman, Dirk Jan Slotboom, Andrew D. Hanson, Thomas Eitinger, Josy ter Beek, Dmitry A. Rodionov, Mikhail S. Gelfand, Irina A. Rodionova, Peter Hebbeln, Guus B. Erkens, and Enzymology
- Subjects
COMPARATIVE GENOMICS ,LACTOBACILLUS-CASEI ,TRYPTOPHAN TRANSPORT ,RNA STRUCTURE ,Restriction Mapping ,Tripartite ATP-independent periplasmic transporter ,ATP-binding cassette transporter ,Context (language use) ,Biology ,Microbiology ,BACILLUS-SUBTILIS ,Bacterial Proteins ,Nickel ,BINDING CASSETTE TRANSPORTERS ,Cloning, Molecular ,Databases, Protein ,Molecular Biology ,Integral membrane protein ,Tryptophan transport ,ATP-binding domain of ABC transporters ,GENE-EXPRESSION ,SOLUTE TRANSPORTERS ,Genome ,Bacteria ,Cell Membrane ,Computational Biology ,Membrane Transport Proteins ,Transporter ,Cobalt ,Vitamins ,Transmembrane protein ,Biochemistry ,ATP-Binding Cassette Transporters ,Leuconostoc ,REGULATORY SYSTEMS - Abstract
The specific and tightly controlled transport of numerous nutrients and metabolites across cellular membranes is crucial to all forms of life. However, many of the transporter proteins involved have yet to be identified, including the vitamin transporters in various human pathogens, whose growth depends strictly on vitamin uptake. Comparative analysis of the ever-growing collection of microbial genomes coupled with experimental validation enables the discovery of such transporters. Here, we used this approach to discover an abundant class of vitamin transporters in prokaryotes with an unprecedented architecture. These transporters have energy-coupling modules comprised of a conserved transmembrane protein and two nucleotide binding proteins similar to those of ATP binding cassette (ABC) transporters, but unlike ABC transporters, they use small integral membrane proteins to capture specific substrates. We identified 21 families of these substrate capture proteins, each with a different specificity predicted by genome context analyses. Roughly half of the substrate capture proteins (335 cases) have a dedicated energizing module, but in 459 cases distributed among almost 100 gram-positive bacteria, including numerous human pathogens, different and unrelated substrate capture proteins share the same energy-coupling module. The shared use of energy-coupling modules was experimentally confirmed for folate, thiamine, and riboflavin transporters. We propose the name energy-coupling factor transporters for the new class of membrane transporters.
- Published
- 2008
244. Identification of replication origins in prokaryotic genomes
- Author
-
Natalia V. Sernova and Mikhail S. Gelfand
- Subjects
Comparative genomics ,Genetics ,Base Sequence ,Molecular Sequence Data ,Chromosome Mapping ,GC skew ,Replication Origin ,Computational biology ,Bacterial genome size ,Sequence Analysis, DNA ,Biology ,Chromosomes, Bacterial ,Origin of replication ,Genome ,DnaA ,Replication (statistics) ,Identification (biology) ,Molecular Biology ,Algorithms ,Software ,Information Systems - Abstract
The availability of hundreds of complete bacterial genomes has created new challenges and simultaneously opportunities for bioinformatics. In the area of statistical analysis of genomic sequences, the studies of nucleotide compositional bias and gene bias between strands and replichores paved way to the development of tools for prediction of bacterial replication origins. Only a few (about 20) origin regions for eubacteria and archaea have been proven experimentally. One reason for that may be that this is now considered as an essentially bioinformatics problem, where predictions are sufficiently reliable not to run labor-intensive experiments, unless specifically needed. Here we describe the main existing approaches to the identification of replication origin (oriC) and termination (terC) loci in prokaryotic chromosomes and characterize a number of computational tools based on various skew types and other types of evidence. We also classify the eubacterial and archaeal chromosomes by predictability of their replication origins using skew plots. Finally, we discuss possible combined approaches to the identification of the oriC sites that may be used to improve the prediction tools, in particular, the analysis of DnaA binding sites using the comparative genomic methods.
- Published
- 2008
245. Transcriptional regulation of NAD metabolism in bacteria: genomic reconstruction of NiaR (YrxA) regulon
- Author
-
Chen Yang, Hong Zhang, Etienne Dervyn, Dariusz Martynowski, Xiaoqing Li, Irina A. Rodionova, Andrei L. Osterman, Mikhail S. Gelfand, Leonardo Sorci, Dmitry A. Rodionov, Rodionov, Dmitry A., Sanford Burnham Prebys Medical Discovery Institute, Institute for Information Transmission Problems, Russian Academy of Sciences [Moscow] (RAS), Unité de recherche Génétique Microbienne (UGM), Institut National de la Recherche Agronomique (INRA), University of Texas Southwestern Medical Center, Fellowship for Interpretation of Genomes, National Institute of Allergy and Infectious Deseases (NIAID) ‘Genomics of Coenzyme Metabolism in Bacterial Pathogens’ [1-R01-AI066244-01A2], Program ‘Molecular and Cellular Biology’ of the Russian Academy of Sciences, Howard Hughes International Research, and National Institute of Health [1-R01-AI066244-01A2]
- Subjects
Transcription, Genetic ,[SDV]Life Sciences [q-bio] ,Bacillus subtilis ,Niacin ,Regulon ,Genome ,03 medical and health sciences ,Bacterial Proteins ,Genetics ,Transcriptional regulation ,Thermotoga maritima ,Electrophoretic mobility shift assay ,Regulatory Elements, Transcriptional ,Gene ,030304 developmental biology ,0303 health sciences ,Binding Sites ,biology ,030306 microbiology ,Membrane Transport Proteins ,Genomics ,Gene Expression Regulation, Bacterial ,NAD ,biology.organism_classification ,Repressor Proteins ,Biochemistry ,NAD+ kinase ,Genome, Bacterial - Abstract
International audience; A comparative genomic approach was used to reconstruct transcriptional regulation of NAD biosynthesis in bacteria containing orthologs of Bacillus subtilis gene yrxA, a previously identified niacin-responsive repressor of NAD de novo synthesis. Members of YrxA family (re-named here NiaR) are broadly conserved in the Bacillus/Clostridium group and in the deeply branching Fusobacteria and Thermotogales lineages. We analyzed upstream regions of genes associated with NAD biosynthesis to identify candidate NiaR-binding DNA motifs and assess the NiaR regulon content in these species. Representatives of the two distinct types of candidate NiaR-binding sites, characteristic of the Firmicutes and Thermotogales, were verified by an electrophoretic mobility shift assay. In addition to transcriptional control of the nadABC genes, the NiaR regulon in some species extends to niacin salvage (the pncAB genes) and includes uncharacterized membrane proteins possibly involved in niacin transport. The involvement in niacin uptake proposed for one of these proteins (re-named NiaP), encoded by the B. subtilis gene yceI, was experimentally verified. In addition to bacteria, members of the NiaP family are conserved in multicellular eukaryotes, including human, pointing to possible NaiP involvement in niacin utilization in these organisms. Overall, the analysis of the NiaR and NrtR regulons (described in the accompanying paper) revealed mechanisms of transcriptional regulation of NAD metabolism in nearly a hundred diverse bacteria.
- Published
- 2008
246. Low-molecular-weight post-translationally modified microcins
- Author
-
Mikhail S. Gelfand, Alexey E. Kazakov, Ekaterina Semenova, Teymur Kazakov, and Konstantin Severinov
- Subjects
chemistry.chemical_classification ,biology ,Bacteria ,Molecular Sequence Data ,Peptide ,Microcin ,biology.organism_classification ,medicine.disease_cause ,Microbiology ,Enterobacteriaceae ,Molecular Weight ,Biochemistry ,Bacteriocin ,chemistry ,Bacteriocins ,Genes, Bacterial ,medicine ,Amino Acid Sequence ,Molecular Biology ,Escherichia coli ,Peptide sequence ,Protein Processing, Post-Translational ,Phylogeny ,Antibacterial agent - Abstract
Microcins are a class of ribosomally synthesized antibacterial peptides produced by Enterobacteriaceae and active against closely related bacterial species. While some microcins are active as unmodified peptides, others are heavily modified by dedicated maturation enzymes. Low-molecular-weight microcins from the post-translationally modified group target essential molecular machines inside the cells. In this review, available structural and functional data about three such microcins--microcin J25, microcin B17 and microcin C7-C51--are discussed. While all three low-molecular-weight post-translationally modified microcins are produced by Escherichia coli, inferences based on sequence and structural similarities with peptides encoded or produced by phylogenetically diverse bacteria are made whenever possible to put these compounds into a larger perspective.
- Published
- 2007
247. Comparative genomics and evolution of alternative splicing: the pessimists' science
- Author
-
Irena I. Artamonova and Mikhail S. Gelfand
- Subjects
Comparative genomics ,Chemistry ,Alternative splicing ,Genomics ,General Chemistry ,General Medicine ,Computational biology ,Exons ,Introns ,Evolution, Molecular ,Alternative Splicing ,Biochemistry ,Animals ,Humans - Published
- 2007
248. A model of evolution with constant selective pressure for regulatory DNA sites
- Author
-
Ekaterina A. Kotelnikova, Vsevolod J. Makeev, Mikhail S. Gelfand, and Farida Enikeeva
- Subjects
Mutation rate ,Binding Sites ,Multiple sequence alignment ,Stationary distribution ,Base Sequence ,Evolution ,DNA ,Regulatory Sequences, Nucleic Acid ,Biology ,Models, Biological ,Evolution, Molecular ,DNA binding site ,Fixation (population genetics) ,Substitution model ,Evolutionary biology ,Molecular evolution ,Consensus Sequence ,Consensus sequence ,QH359-425 ,Ecology, Evolution, Behavior and Systematics ,Research Article ,Transcription Factors - Abstract
Background Molecular evolution is usually described assuming a neutral or weakly non-neutral substitution model. Recently, new data have become available on evolution of sequence regions under a selective pressure, e.g. transcription factor binding sites. To reconstruct the evolutionary history of such sequences, one needs evolutionary models that take into account a substantial constant selective pressure. Results We present a simple evolutionary model with a single preferred (consensus) nucleotide and the neutral substitution model adopted for all other nucleotides. This evolutionary model has a rate matrix in which all substitutions that do not involve the consensus nucleotide occur with the same rate. The model has two time scales for achieving a stationary distribution; in the general case only one of the two rate parameters can be evaluated from the stationary distribution. In the middle-time zone, a counterintuitive behavior was observed for some parameter values, with a probability of conservation for a non-consensus nucleotide greater than that for the consensus nucleotide. Such an effect can be observed only in the case of weak preference for the consensus nucleotide, when the probability to observe the consensus nucleotide in the stationary distribution is less than 1/2. If the substitution rate is represented as a product of mutation and fixation, only the fixation can be calculated from the stationary distribution. The exhibited conservation of non-consensus nucleotides does not take place if the elements of mutation matrix are identical, and can be related to the reduced mutation rate between the non-consensus nucleotides. This bias can have no effect on the stationary distribution of nucleotide frequencies calculated over the ensemble of multiple alignments, e.g. transcription factor binding sites upstream of different sets of co-regulated orthologous genes. Conclusion The derived model can be used as a null model when analyzing the evolution of orthologous transcription factor binding sites. In particular, our findings show that a nucleotide preferred at some position of a multiple alignment of binding sites for some transcription factor in the same genome is not necessarily the most conserved nucleotide in an alignment of orthologous sites from different species. However, this effect can take place only in the case of a mutation matrix whose elements are not identical.
- Published
- 2007
249. Conserved and species-specific alternative splicing in mammalian genomes
- Author
-
Alexey D. Neverov, Mikhail S. Gelfand, Alexander V. Favorov, Andrey A. Mironov, and Ramil N. Nurtdinov
- Subjects
Evolution ,Exonic splicing enhancer ,Computational biology ,Biology ,Genome ,Conserved sequence ,Exon ,Mice ,Dogs ,Species Specificity ,QH359-425 ,Animals ,Humans ,Amino Acid Sequence ,Ecology, Evolution, Behavior and Systematics ,Conserved Sequence ,Genetics ,Expressed Sequence Tags ,Mammals ,Expressed sequence tag ,Base Sequence ,Genome, Human ,Alternative splicing ,Exons ,Alternative Splicing ,Proteome ,Human genome ,RNA Splice Sites ,Research Article - Abstract
Background Alternative splicing has been shown to be one of the major evolutionary mechanisms for protein diversification and proteome expansion, since a considerable fraction of alternative splicing events appears to be species- or lineage-specific. However, most studies were restricted to the analysis of cassette exons in pairs of genomes and did not analyze functionality of the alternative variants. Results We analyzed conservation of human alternative splice sites and cassette exons in the mouse and dog genomes. Alternative exons, especially minor-isofom ones, were shown to be less conserved than constitutive exons. Frame-shifting alternatives in the protein-coding regions are less conserved than frame-preserving ones. Similarly, the conservation of alternative sites is highest for evenly used alternatives, and higher when the distance between the sites is divisible by three. The rate of alternative-exon and site loss in mouse is slightly higher than in dog, consistent with faster evolution of the former. The evolutionary dynamics of alternative sites was shown to be consistent with the model of random activation of cryptic sites. Conclusion Consistent with other studies, our results show that minor cassette exons are less conserved than major-alternative and constitutive exons. However, our study provides evidence that this is caused not only by exon birth, but also lineage-specific loss of alternative exons and sites, and it depends on exon functionality.
- Published
- 2007
250. Global transcriptional response of Nitrosomonas europaea to chloroform and chloromethane
- Author
-
Elizabeth A. Permina, Peter J. Bottomley, Luis A. Sayavedra-Soto, Barbara O. Gvakharia, Daniel J. Arp, and Mikhail S. Gelfand
- Subjects
Subfamily ,Transcription, Genetic ,Nitrosomonas europaea ,Applied Microbiology and Biotechnology ,chemistry.chemical_compound ,Bacterial Proteins ,Transcription (biology) ,Heat shock protein ,RNA, Messenger ,Evolutionary and Genomic Microbiology ,Gene ,Chloroform ,Ecology ,biology ,Chloromethane ,Gene Expression Profiling ,Gene Expression Regulation, Bacterial ,biology.organism_classification ,Molecular biology ,Adaptation, Physiological ,Anti-Bacterial Agents ,RNA, Bacterial ,Biochemistry ,chemistry ,Methyl Chloride ,Bacteria ,Food Science ,Biotechnology - Abstract
Upon exposure of Nitrosomonas europaea to chloroform (7 μM, 1 h), transcripts for 175 of 2,460 genes were found at higher levels in treated cells than in untreated cells and transcripts for 501 genes were found at lower levels. With chloromethane (3.2 mM, 1 h), transcripts for 67 genes were at higher levels and transcripts for 148 genes were at lower levels. Transcripts for 37 genes were at higher levels following both treatments and included genes for heat shock proteins, σ-factors of the extracytoplasmic function subfamily, and toxin-antitoxin loci. N. europaea has higher levels of transcripts for a variety of defense genes when exposed to chloroform or chloromethane.
- Published
- 2007
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.