Author: "Boris E, Shakhnovich" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Boris E, Shakhnovich"' showing total 20 results

Start Over Author "Boris E, Shakhnovich" Database OpenAIRE

20 results on '"Boris E, Shakhnovich"'

1. Improvisation in evolution of genes and genomes: whose structure is it anyway?

Author: Eugene I. Shakhnovich and Boris E Shakhnovich
Subjects: Structure (mathematical logic), Genetics, Genome, Proteins, Biology, Protein structure prediction, Biological Evolution, Article, Transcriptome, Structural Biology, Evolutionary biology, Gene Duplication, Gene duplication, Molecular Biology, Gene, Organism, Sequence (medicine)
Abstract: Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Published: 2008
Full Text: View/download PDF

2. Defining functional distance using manifold embeddings of gene ontology annotations

Author: Gilad Lerman and Boris E. Shakhnovich
Subjects: Sequence, Multidisciplinary, Theoretical computer science, Similarity (geometry), Models, Genetic, Sequence Homology, Amino Acid, Function space, Proteins, Context (language use), Function (mathematics), Expression (computer science), Biology, Bioinformatics, Manifold, Protein Structure, Tertiary, Evolution, Molecular, Structure-Activity Relationship, Kernel method, Sequence Homology, Nucleic Acid, Physical Sciences
Abstract: Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities. Finally, we show that manual classification of structures into folds and superfamilies is mirrored by proximity in the newly defined function space. We show how functional distances place structure–function relationships in biological context resulting in insight into divergent and convergent evolution. The methods and results in this paper can be readily generalized and applied to a wide array of biologically relevant investigations, such as accuracy of annotation transference, the relationship between sequence, structure, and function, or coherence of expression modules.
Published: 2007
Full Text: View/download PDF

3. Origins and impact of constraints in evolution of gene families

Author: Boris E. Shakhnovich and Eugene V. Koonin
Subjects: Letter, Pseudogene, Genes, Fungal, Gene Expression, Saccharomyces cerevisiae, Biology, Evolution, Molecular, Negative selection, Species Specificity, Gene Duplication, Databases, Genetic, Gene duplication, Escherichia coli, Genetics, Animals, Gene family, Selection, Genetic, Caenorhabditis elegans, Genes, Helminth, Genetics (clinical), Genes, Essential, ROC Curve, Genes, Bacterial, Essential gene, Multigene Family, Subfunctionalization, Neofunctionalization, Pseudogenes, Functional divergence, Transcription Factors
Abstract: Recent investigations of high-throughput genomic and phenomic data have uncovered a variety of significant but relatively weak correlations between a gene’s functional and evolutionary characteristics. In particular, essential genes and genes with paralogs have a slight propensity to evolve more slowly than nonessential genes and singletons, respectively. However, given the weakness and multiplicity of these associations, their biological relevance remains uncertain. Here, we show that existence of an essential paralog can be used as a specific and strong gauge of selection. We partition gene families in several genomes into two classes: those that include at least one essential gene (E-families) and those without essential genes (N-families). We find that weaker purifying selection causes N-families to evolve in a more dynamic regime with higher rates both of duplicate fixation and pseudogenization. Because genes in E-families are subject to significantly stronger purifying selection than those in N-families, they survive longer and exhibit greater sequence divergence. Longer average survival time also allows for divergence of upstream regulatory regions, resulting in change of transcriptional context among paralogs in E-families. These findings are compatible with differential division of ancestral functions (subfunctionalization) or emergence of novel functions (neofunctionalization) being the prevalent modes of evolution of paralogs in E-families as opposed to pseudogenization (nonfunctionalization), which is the typical fate of paralogs in N-families. Unlike other characteristics of genes, such as essentiality, existence of paralogs, or expression level, membership in an E-family or an N-family strongly correlates with the level of selection and appears to be a major determinant of a gene’s evolutionary fate.
Published: 2006
Full Text: View/download PDF

4. Relative contributions of structural designability and functional diversity in molecular evolution of duplicates

Author: Boris E. Shakhnovich
Subjects: Statistics and Probability, Genetics, Models, Genetic, Pseudogene, DNA Mutational Analysis, Chromosome Mapping, Genetic Variation, Locus (genetics), Evolutionary pressure, Biology, Biochemistry, Computer Science Applications, Evolution, Molecular, Computational Mathematics, Functional diversity, Fixation (population genetics), Genetics, Population, Computational Theory and Mathematics, Genes, Duplicate, Molecular evolution, Evolutionary biology, Gene duplication, Gene family, Selection, Genetic, Molecular Biology
Abstract: Analysis of increasingly saturated sequence databases have shown that gene family sizes are highly skewed with many families being small and few containing many, far-diverged homologs. Additionally, recently published results have identified a structural determinant of mutational plasticity: designability that correlates strongly with gene family size. In this paper, we explore the possible links between the two observations, exploring the possible effect of designability on duplication and divergence. We show that designability has an inverse of expected relationship with strength of selection. More designable domains that should have more mutational plasticity evolve slower. However, we also present evidence that recently duplicated genes have variable probability of locus fixation correlated with strength of selection. As expected, paralogs under stronger evolutionary pressure have a lower failure rate. Finally, we show that probability of pseudogene formation from gene duplication can be directly tied to designability and functional flexibility of the family. We present evidence that gene families with higher designability have diverged farther because of lower probability of pseudogenization. Additionally, mutational plasticity may play an integral role by influencing pseudogenization rate. Either way, we show that considering the failure rate of duplications is integral in understanding the determinants and dynamics of molecular evolution. Contact: borya@acs.bu.edu
Published: 2006
Full Text: View/download PDF

5. Protein structure and evolutionary history determine sequence space topology

Author: Eugene I. Shakhnovich, Eric J. Deeds, Charles DeLisi, and Boris E. Shakhnovich
Subjects: Models, Molecular, Individual gene, Pseudogene, Molecular Sequence Data, Biology, Evolution, Molecular, Protein structure, Gene duplication, Genetics, Gene family, Quantitative Biology - Genomics, Letters, Amino Acid Sequence, Gene, Peptide sequence, Genetics (clinical), Coevolution, Genomics (q-bio.GN), Models, Genetic, Sequence Homology, Amino Acid, Proteins, Biomolecules (q-bio.BM), Quantitative Biology - Biomolecules, Evolutionary biology, FOS: Biological sciences, Mutation, Thermodynamics
Abstract: Understanding the observed variability in the number of homologs of a gene is a very important unsolved problem that has broad implications for research into coevolution of structure and function, gene duplication, pseudogene formation, and possibly for emerging diseases. Here, we attempt to define and elucidate some possible causes behind the observed irregularity in sequence space. We present evidence that sequence variability and functional diversity of a gene or fold family is influenced by quantifiable characteristics of the protein structure. These characteristics reflect the structural potential for sequence plasticity, i.e., the ability to accept mutation without losing thermodynamic stability. We identify a structural feature of a protein domain—contact density—that serves as a determinant of entropy in sequence space, i.e., the ability of a protein to accept mutations without destroying the fold (also known as fold designability). We show that (log) of average gene family size exhibits statistical correlation (R2 > 0.9.) with contact density of its three-dimensional structure. We present evidence that the size of individual gene families are influenced not only by the designability of the structure, but also by evolutionary history, e.g., the amount of time the gene family was in existence. We further show that our observed statistical correlation between gene family size and contact density of the structure is valid on many levels of evolutionary divergence, i.e., not only for closely related sequence, but also for less-related fold and superfamily levels of homology.
Published: 2005
Full Text: View/download PDF

6. Imprint of evolution on protein structures

Author: Guido Tiana, Eugene I. Shakhnovich, Nikolay V. Dokholyan, and Boris E. Shakhnovich
Subjects: Genetics, Multidisciplinary, Protein domain, Evolutionary algorithm, Proteins, Computational biology, Biological Sciences, Biology, Models, Biological, Evolution, Molecular, Divergent evolution, Protein structure, Lattice (order), Gene duplication, Thermodynamics, Evolutionary dynamics, Gene, Algorithms
Abstract: We attempt to understand the evolutionary origin of protein folds by simulating their divergent evolution with a three-dimensional lattice model. Starting from an initial seed lattice structure, evolution of model proteins progresses by sequence duplication and subsequent point mutations. A new gene's ability to fold into a stable and unique structure is tested each time through direct kinetic folding simulations. Where possible, the algorithm accepts the new sequence and structure and thus a “new protein structure” is born. During the course of each run, this model evolutionary algorithm provides several thousand new proteins with diverse structures. Analysis of evolved structures shows that later evolved structures are more designable than seed structures as judged by recently developed structural determinant of protein designability, as well as direct estimate of designability for selected structures by thermodynamic sampling of their sequence space. We test the significance of this trend predicted on lattice models on real proteins and show that protein domains that are found in eukaryotic organisms only feature statistically significant higher designability than their prokaryotic counterparts. These results present a fundamental view on protein evolution highlighting the relative roles of structural selection and evolutionary dynamics on genesis of modern proteins.
Published: 2004
Full Text: View/download PDF

7. Natural selection of more designable folds: A mechanism for thermophilic adaptation

Author: Eugene I. Shakhnovich, Jeremy L. England, and Boris E. Shakhnovich
Subjects: Protein Folding, Multidisciplinary, Natural selection, Proteome, Ecology, Thermophile, Biophysics, Temperature, Proteins, Bacterial genome size, Fold (geology), Biological Sciences, Biology, Adaptation, Physiological, Biophysical Phenomena, Quantitative measure, Evolutionary biology, Thermodynamics, Protein folding, Selection, Genetic
Abstract: An open question of great interest in biophysics is whether variations in structure cause protein folds to differ in the number of amino acid sequences that can fold to them stably, i.e., in their designability. Recently, we have shown that a novel quantitative measure of a fold's tertiary topology, called its contact trace, strongly correlates with the fold's designability. Here, we investigate the relationship between a fold's contact trace and its relative frequency of usage in mesophilic vs. thermophilic eubacteria. We observe that thermophilic organisms exhibit a bias toward using folds of higher contact trace when compared with mesophiles. We establish this difference both for the distributions of folds at the whole-proteome level and also through more focused structural comparisons of orthologous proteins. Our findings suggest that thermophilic adaptation in bacterial genomes occurs in part through natural selection of more designable folds, pointing to designability as a key component of protein fitness.
Published: 2003
Full Text: View/download PDF

8. Functional Fingerprints of Folds: Evidence for Correlated Structure–Function Evolution

Author: Eugene I. Shakhnovich, Nikolay V. Dokholyan, Boris E. Shakhnovich, and Charles DeLisi
Subjects: Proteomics, Protein Folding, Structural similarity, Protein domain, Computational biology, Biology, Bioinformatics, Domain (software engineering), Evolution, Molecular, Structure-Activity Relationship, Adenosine Triphosphate, Structural Biology, Cluster (physics), Databases, Protein, Divergence (statistics), Cluster analysis, Molecular Biology, Computational Biology, Proteins, Enzymes, Protein Structure, Tertiary, Divergent evolution, Graph (abstract data type), Guanosine Triphosphate, Protein Binding, Signal Transduction
Abstract: Using structural similarity clustering of protein domains: protein domain universe graph (PDUG), and a hierarchical functional annotation: gene ontology (GO) as two evolutionary lenses, we find that each structural cluster (domain fold) exhibits a distribution of functions that is unique to it. These functional distributions are functional fingerprints that are specific to characteristic structural clusters and vary from cluster to cluster. Furthermore, as structural similarity threshold for domain clustering in the PDUG is relaxed we observe an influx of earlier-diverged domains into clusters. These domains join clusters without destroying the functional fingerprint. These results can be understood in light of a divergent evolution scenario that posits correlated divergence of structural and functional traits in protein domains from one or few progenitors.
Published: 2003
Full Text: View/download PDF

9. A first-principles model of early evolution: emergence of gene families, species, and preferred protein folds

Author: Konstantin B. Zeldovich, Eugene I. Shakhnovich, Peiqiu Chen, and Boris E. Shakhnovich
Subjects: Protein Folding, Protein Conformation, Protein structure, Sequence Analysis, Protein, Databases, Protein, lcsh:QH301-705.5, Genetics, education.field_of_study, Quantitative Biology::Biomolecules, Genome, Ecology, Biological Evolution, Computational Theory and Mathematics, Multigene Family, Modeling and Simulation, Mutation (genetic algorithm), Viruses, Protein folding, Research Article, Protein family, Population, Biophysics, Sequence alignment, Biology, Structure-Activity Relationship, Cellular and Molecular Neuroscience, Life Expectancy, Exponential growth, Gene family, Animals, Humans, Computer Simulation, Selection, Genetic, Quantitative Biology - Populations and Evolution, education, Population Growth, Molecular Biology, Ecology, Evolution, Behavior and Systematics, Chronobiology Phenomena, Stochastic Processes, Models, Genetic, Sequence Homology, Amino Acid, Populations and Evolution (q-bio.PE), Proteins, Computational Biology, Biomolecules (q-bio.BM), Archaea, Quantitative Biology - Biomolecules, lcsh:Biology (General), Evolutionary biology, FOS: Biological sciences, Mutation, Sequence Alignment
Abstract: In this work we develop a microscopic physical model of early evolution, where phenotype,organism life expectancy, is directly related to genotype, the stability of its proteins in their native conformations which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the Big Bang scenario whereby exponential population growth ensues as soon as favorable sequence-structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at time scales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary time scales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species, subpopulations which carry similar genomes. Further we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first principles picture of how first gene families developed in the course of early evolution, Comment: In press, PLoS Computational Biology
Published: 2007

10. Structural similarity enhances interaction propensity of proteins

Author: Eugene I. Shakhnovich, David B. Lukatsky, Boris E. Shakhnovich, and Julian Mintseris
Subjects: Models, Molecular, Structural similarity, Computational biology, Plasma protein binding, Biology, DNA-binding protein, Article, Protein–protein interaction, Evolution, Molecular, Structure-Activity Relationship, Bacterial Proteins, Structural Biology, Structure–activity relationship, Amino Acids, Databases, Protein, Molecular Biology, chemistry.chemical_classification, Proteins, Biomolecules (q-bio.BM), Evolutionary pressure, Amino acid, DNA-Binding Proteins, chemistry, Biochemistry, Quantitative Biology - Biomolecules, Structural Homology, Protein, FOS: Biological sciences, Statistical Prevalence, Dimerization, Protein Binding
Abstract: We study statistical properties of interacting protein-like surfaces and predict two strong, related effects: (i) statistically enhanced self-attraction of proteins; (ii) statistically enhanced attraction of proteins with similar structures. The effects originate in the fact that the probability to find a pattern self-match between two identical, even randomly organized interacting protein surfaces is always higher compared with the probability for a pattern match between two different, promiscuous protein surfaces. This theoretical finding explains statistical prevalence of homodimers in protein-protein interaction networks reported earlier. Further, our findings are confirmed by the analysis of curated database of protein complexes that showed highly statistically significant overrepresentation of dimers formed by structurally similar proteins with highly divergent sequences ("superfamily heterodimers"). We suggest that promiscuous homodimeric interactions pose strong competitive interactions for heterodimers evolved from homodimers. Such evolutionary bottleneck is overcome using the negative design evolutionary pressure applied against promiscuous homodimer formation. This is achieved through the formation of highly specific contacts formed by charged residues as demonstrated both in model and real superfamily heterodimers.
Published: 2006

11. Assessing transcription factor motif drift from noisy decoy sequences

Author: Timothy E, Reddy, Charles, DeLisi, and Boris E, Shakhnovich
Subjects: DNA-Binding Proteins, Chromatin Immunoprecipitation, Binding Sites, Base Sequence, Computational Biology, DNA, Saccharomyces cerevisiae, Genome, Fungal, Microarray Analysis, Promoter Regions, Genetic, Algorithms, Protein Binding, Transcription Factors
Abstract: Genome scale identification of transcription factor binding sites (TFBS) is fundamental to understanding the complexities of mRNA expression at both the cell and organismal levels. While high-throughput experimental methods provide associations between transcription factors and the genes they regulate under a specified experimental condition, computational methods are still required to pinpoint the exact location of binding. Moreover, since the binding site is an intrinsic property of the promoter region, computational methods are in principle more general than condition dependent experimental methods. Computational identification of TFBSs is complicated in at least two different ways. First, transcription factors bind a heterogeneous distribution of sites and therefore have a distribution of affinities. Second, the set of sequences for which a common site is to be determined do not all have a site for the TF of interest. In this paper, we evaluate the robustness of TFBS identification with respect to both effects. We show addition of upstream regions that do not have the TFBS destroy the specificity of the predicted binding site. We also propose a method to calculate the distance between position weight matrices that can be used to measure "drift'' from the canonical binding site. The results presented here could be useful in developing future transcription factor binding site identification algorithms.
Published: 2005

12. Improving the precision of the structure-function relationship by considering phylogenetic context

Author: Boris E. Shakhnovich
Subjects: Information retrieval, Ecology, Phylogenetic tree, Computer science, business.industry, Structure function, Correction, Context (language use), Cellular and Molecular Neuroscience, Text mining, Computational Theory and Mathematics, lcsh:Biology (General), Modeling and Simulation, Genetics, business, Molecular Biology, lcsh:QH301-705.5, Ecology, Evolution, Behavior and Systematics
Published: 2005

13. ELISA: a unified, multidimensional view of the protein domain universe

Author: Boris E, Shakhnovich, John Max, Harvey, and Charles, Delisi
Subjects: Evolution, Molecular, Models, Genetic, Molecular Sequence Data, Proteins, Amino Acid Sequence, Databases, Protein, Phylogeny
Abstract: ELISA (http://romi.bu.edu/elisa/) is a database that was designed for flexibility in defining interesting queries about protein domain evolution. We have defined and included both the inherent characteristics of the domains such as structure and function and comparisons of these characteristics between domains. Thus, the database is useful in defining structural and functional links between related protein domains and by extension sequences that encode them. In this database we introduce and employ a novel method of functional annotation and comparison. For each protein domain we create a probabilistic functional annotation tree using GO. We have designed an algorithm that accurately compares these trees and thus provides a measure of "functional distance" between two protein domains. Along with functional annotation, we have also included structural comparison between protein domains and best sequence comparisons to all known genomes. The latter enables researchers to dynamically do searches for domains sharing similar phylogenetic profiles. This combination of data and tools enables the researcher to design complex queries to carry out research in the areas of protein domain evolution, structure prediction and functional annotation of novel sequences.
Published: 2005

14. Comparisons of predicted genetic modules: identification of co-expressed genes through module gene flow

Author: Boris E, Shakhnovich, Timothy E, Reddy, Kevin, Galinsky, Joseph, Mellor, and Charles, Delisi
Subjects: Genes, Models, Genetic, Gene Expression Profiling, Databases, Genetic, Sequence Analysis, DNA, Algorithms
Abstract: A question of fundamental importance is the definition and identification of modules from microarray experiments. A wide variety of techniques have been used to gain insight into the elucidation of such modules. One problem, however, is the inability to directly compare results between the different data sets produced due to the inherent parameterizations of their approaches. We first aim to provide a mechanism by which different approaches to module finding can be directly compared. Moreover, the same approach can be used to internally compare the modules predicted by the same technique, but at different parameterizations. We apply this approach to analyze the flow of genes through modules at different module thresholds of the Barkai Signature method, thereby further resolving the modules into sets of co-expressed genes.
Published: 2005

15. Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation

Author: J. Max Harvey and Boris E. Shakhnovich
Subjects: Theoretical computer science, Function space, Computer science, Biochemical Phenomena, Protein domain, Graph theory, Bioinformatics, Structural genomics, Annotation, Structure-Activity Relationship, Corollary, Protein Annotation, Structural Biology, Terminology as Topic, Computer Graphics, Entropy (information theory), Molecular Biology
Abstract: Since the advent of investigations into structural genomics, research has focused on correctly identifying domain boundaries, as well as domain similarities and differences in the context of their evolutionary relationships. As the science of structural genomics ramps up adding more and more information into the databanks, questions about the accuracy and completeness of our classification and annotation systems appear on the forefront of this research. A central question of paramount importance is how structural similarity relates to functional similarity. Here, we begin to rigorously and quantitatively answer these questions by first exploring the consensus between the most common protein domain structure annotation databases CATH, SCOP and FSSP. Each of these databases explores the evolutionary relationships between protein domains using a combination of automatic and manual, structural and functional, continuous and discrete similarity measures. In order to examine the issue of consensus thoroughly, we build a generalized graph out of each of these databases and hierarchically cluster these graphs at interval thresholds. We then employ a distance measure to find regions of greatest overlap. Using this procedure we were able not only to enumerate the level of consensus between the different annotation systems, but also to define the graph-theoretical origins behind the annotation schema of class, family and superfamily by observing that the same thresholds that define the best consensus regions between FSSP, SCOP and CATH correspond to distinct, non-random phase-transitions in the structure comparison graph itself. To investigate the correspondence in divergence between structure and function further, we introduce a measure of functional entropy that calculates divergence in function space. First, we use this measure to calculate the general correlation between structural homology and functional proximity. We extend this analysis further by quantitatively calculating the average amount of functional information gained from our understanding of structural distance and the corollary inherent uncertainty that represents the theoretical limit of our ability to infer function from structural similarity. Finally we show how our measure of functional "entropy" translates into a more intuitive concept of functional annotation into similarity EC classes.
Published: 2003

16. Expanding protein universe and its origin from the biological Big Bang

Author: Eugene I. Shakhnovich, Nikolay V. Dokholyan, and Boris E. Shakhnovich
Subjects: Big Bang, Models, Molecular, Multidisciplinary, media_common.quotation_subject, Proteins, Biology, Biological Sciences, Universe, Protein Structure, Tertiary, Evolution, Molecular, Molecular evolution, Abiogenesis, Evolutionary biology, Computer Simulation, media_common
Abstract: The bottom-up approach to understanding the evolution of organisms is by studying molecular evolution. With the large number of protein structures identified in the past decades, we have discovered peculiar patterns that nature imprints on protein structural space in the course of evolution. In particular, we have discovered that the universe of protein structures is organized hierarchically into a scale-free network. By understanding the cause of these patterns, we attempt to glance at the very origin of life.
Published: 2002

17. Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites

Author: Charles DeLisi, Timothy E. Reddy, and Boris E Shakhnovich
Subjects: Systems biology, Molecular Sequence Data, Sequence alignment, Computational biology, Biology, Saccharomyces, Cellular and Molecular Neuroscience, Genetics, Computer Simulation, Binding site, Promoter Regions, Genetic, Cluster analysis, lcsh:QH301-705.5, Molecular Biology, Transcription factor, Ecology, Evolution, Behavior and Systematics, Binding selectivity, Binding Sites, Base Sequence, Models, Genetic, Ecology, Computational Biology, Genetics and Genomics, Promoter, Sequence Analysis, DNA, DNA binding site, lcsh:Biology (General), Models, Chemical, Computational Theory and Mathematics, Modeling and Simulation, Algorithms, Mathematics, Protein Binding, Transcription Factors, Research Article
Abstract: Computational prediction of nucleotide binding specificity for transcription factors remains a fundamental and largely unsolved problem. Determination of binding positions is a prerequisite for research in gene regulation, a major mechanism controlling phenotypic diversity. Furthermore, an accurate determination of binding specificities from high-throughput data sources is necessary to realize the full potential of systems biology. Unfortunately, recently performed independent evaluation showed that more than half the predictions from most widely used algorithms are false. We introduce a graph-theoretical framework to describe local sequence similarity as the pair-wise distances between nucleotides in promoter sequences, and hypothesize that densely connected subgraphs are indicative of transcription factor binding sites. Using a well-established sampling algorithm coupled with simple clustering and scoring schemes, we identify sets of closely related nucleotides and test those for known TF binding activity. Using an independent benchmark, we find our algorithm predicts yeast binding motifs considerably better than currently available techniques and without manual curation. Importantly, we reduce the number of false positive predictions in yeast to less than 30%. We also develop a framework to evaluate the statistical significance of our motif predictions. We show that our approach is robust to the choice of input promoters, and thus can be used in the context of predicting binding positions from noisy experimental data. We apply our method to identify binding sites using data from genome scale ChIP–chip experiments. Results from these experiments are publicly available at http://cagt10.bu.edu/BSG. The graphical framework developed here may be useful when combining predictions from numerous computational and experimental measures. Finally, we discuss how our algorithm can be used to improve the sensitivity of computational predictions of transcription factor binding specificities., Author Summary A historically difficult problem in computational biology is the identification of transcription factor binding sites (TFBS) in the promoters of co-regulated genes. With increasing emphasis on research in transcriptional regulation, this problem is also uniquely relevant to emerging results from recent experiments in high-throughput and systems biology. Despite extensive research in the area, recent evaluations of previously published techniques show much room for improvement. In this paper, we introduce a fundamentally new approach to the identification of TFBS. First, we start by representing nucleotides in promoters as an undirected, weighted graph. Given this representation of a binding site graph (BSG), we employ relatively simple graph clustering techniques to identify functional TFBS. We show that BSG predictions significantly outperform all previously evaluated methods in nearly every performance measure using a standardized assessment benchmark. We also find that this approach is more robust than traditional Gibbs sampling to selection of input promoters, and thus more likely to perform well under noisy experimental conditions. Finally, BSGs are very good at predicting specificity determining nucleotides. Using BSG predictions, we were able to confirm recent experimental results on binding specificity of E-box TFs CBF1 and PHO4 and predict novel specificity determining nucleotides for TYE7.
Published: 2007
Full Text: View/download PDF

18. Positional clustering improves computational binding site detection and identifies novel cis -regulatory sites in mammalian GABA A receptor subunit genes

Author: Shelley J. Russek, Charles DeLisi, Boris E. Shakhnovich, Timothy E. Reddy, and Daniel S. Roberts
Subjects: Protein subunit, Saccharomyces cerevisiae, Biology, gamma-Aminobutyric acid, 03 medical and health sciences, Mice, 0302 clinical medicine, Neurotransmitter receptor, Genetics, medicine, Transcriptional regulation, Animals, Cluster Analysis, Binding site, Promoter Regions, Genetic, Gene, Transcription factor, Cells, Cultured, 030304 developmental biology, Neurons, 0303 health sciences, Binding Sites, GABAA receptor, Computational Biology, Sequence Analysis, DNA, Receptors, GABA-A, Rats, Protein Subunits, Methods Online, 030217 neurology & neurosurgery, Algorithms, medicine.drug, Transcription Factors
Abstract: Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming. We introduce a simple strategy that dramatically improves robustness and accuracy of computational binding site prediction. First, we evaluate the rate of recurrence of computational TFBS predictions by commonly used sampling procedures. We find that the vast majority of results are biologically meaningless. However clustering results based on nucleotide position improves predictive power. Additionally, we find that positional clustering increases robustness to long or imperfectly selected input sequences. Positional clustering can also be used as a mechanism to integrate results from multiple sampling approaches for improvements in accuracy over each one alone. Finally, we predict and validate regulatory sequences partially responsible for transcriptional control of the mammalian type A gamma-aminobutyric acid receptor (GABA(A)R) subunit genes. Positional clustering is useful for improving computational binding site predictions, with potential application to improving our understanding of mammalian gene expression. In particular, predicted regulatory mechanisms in the mammalian GABA(A)R subunit gene family may open new avenues of research towards understanding this pharmacologically important neurotransmitter receptor system.
Published: 2007
Full Text: View/download PDF

19. Improving the Precision of the Structure–Function Relationship by Considering Phylogenetic Context

Author: Boris E. Shakhnovich
Subjects: media_common.quotation_subject, Protein domain, Genetics/Functional Genomics, Context (language use), Computational biology, Biology, computer.software_genre, Structural genomics, Cellular and Molecular Neuroscience, Annotation, Simple (abstract algebra), Similarity (psychology), Genetics, Function (engineering), lcsh:QH301-705.5, Molecular Biology, Ecology, Evolution, Behavior and Systematics, media_common, Ecology, Structure-Function, Molecular Biology - Structural Biology, Variable (computer science), lcsh:Biology (General), Computational Theory and Mathematics, Modeling and Simulation, PDUG, Structural Genomics, Data mining, computer, Bioinformatics - Computational Biology, Research Article
Abstract: Understanding the relationship between protein structure and function is one of the foremost challenges in post-genomic biology. Higher conservation of structure could, in principle, allow researchers to extend current limitations of annotation. However, despite significant research in the area, a precise and quantitative relationship between biochemical function and protein structure has been elusive. Attempts to draw an unambiguous link have often been complicated by pleiotropy, variable transcriptional control, and adaptations to genomic context, all of which adversely affect simple definitions of function. In this paper, I report that integrating genomic information can be used to clarify the link between protein structure and function. First, I present a novel measure of functional proximity between protein structures (F-score). Then, using F-score and other entirely automatic methods measuring structure and phylogenetic similarity, I present a three-dimensional landscape describing their inter-relationship. The result is a “well-shaped” landscape that demonstrates the added value of considering genomic context in inferring function from structural homology. A generalization of methodology presented in this paper can be used to improve the precision of annotation of genes in current and newly sequenced genomes., Synopsis The author provides a novel perspective on a key problem of structural biology: the structure–function relationship in proteins. While relatedness in protein structure correlates with general description of function, attempts to use this relationship predictively are often complicated by its ambiguous nature. A structure encoded by a family of sequences may be implicated in a set of diverse functions across a variety of organisms. The author outlines an innovative approach that underlines the importance of considering genomic context when using structure-comparison methods for functional prediction. First, the author defines two distance measures: in genomic space and in function space. Then, the author describes a landscape of functional distance based on both structural and phylogenetic relatedness. It turns out that this landscape forms a “functional well” where proximity occurs when the structures are similar and occur in the same set of genomes. This result may have implications in future research into functional prediction. With the increasing pace of sequence deposition into databanks, this result suggests a simple way to improve functional prediction via structure homology by complementing existing methods with emerging techniques from comparative genomics.
Published: 2005
Full Text: View/download PDF

20. [Untitled]

Author: John Max Harvey, Boris E. Shakhnovich, David R. Lorenz, Eugene I. Shakhnovich, Steve Comeau, and Charles DeLisi
Subjects: Genetics, Structure (mathematical logic), Theoretical computer science, Applied Mathematics, Probabilistic logic, Online database, Inference, Biology, ENCODE, Biochemistry, Computer Science Applications, Structural genomics, Set (abstract data type), Structural Biology, Graph (abstract data type), Molecular Biology
Abstract: The problem of functional annotation based on homology modeling is primary to current bioinformatics research. Researchers have noted regularities in sequence, structure and even chromosome organization that allow valid functional cross-annotation. However, these methods provide a lot of false negatives due to limited specificity inherent in the system. We want to create an evolutionarily inspired organization of data that would approach the issue of structure-function correlation from a new, probabilistic perspective. Such organization has possible applications in phylogeny, modeling of functional evolution and structural determination. ELISA (Evolutionary Lineage Inferred from Structural Analysis, http://romi.bu.edu/elisa) is an online database that combines functional annotation with structure and sequence homology modeling to place proteins into sequence-structure-function "neighborhoods". The atomic unit of the database is a set of sequences and structural templates that those sequences encode. A graph that is built from the structural comparison of these templates is called PDUG (protein domain universe graph). We introduce a method of functional inference through a probabilistic calculation done on an arbitrary set of PDUG nodes. Further, all PDUG structures are mapped onto all fully sequenced proteomes allowing an easy interface for evolutionary analysis and research into comparative proteomics. ELISA is the first database with applicability to evolutionary structural genomics explicitly in mind. Availability: The database is available at http://romi.bu.edu/elisa.
Published: 2003
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

20 results on '"Boris E, Shakhnovich"'

1. Improvisation in evolution of genes and genomes: whose structure is it anyway?

2. Defining functional distance using manifold embeddings of gene ontology annotations

3. Origins and impact of constraints in evolution of gene families

4. Relative contributions of structural designability and functional diversity in molecular evolution of duplicates

5. Protein structure and evolutionary history determine sequence space topology

6. Imprint of evolution on protein structures

7. Natural selection of more designable folds: A mechanism for thermophilic adaptation

8. Functional Fingerprints of Folds: Evidence for Correlated Structure–Function Evolution

9. A first-principles model of early evolution: emergence of gene families, species, and preferred protein folds

10. Structural similarity enhances interaction propensity of proteins

11. Assessing transcription factor motif drift from noisy decoy sequences

12. Improving the precision of the structure-function relationship by considering phylogenetic context

13. ELISA: a unified, multidimensional view of the protein domain universe

14. Comparisons of predicted genetic modules: identification of co-expressed genes through module gene flow

15. Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation

16. Expanding protein universe and its origin from the biological Big Bang

17. Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites

18. Positional clustering improves computational binding site detection and identifies novel cis -regulatory sites in mammalian GABA A receptor subunit genes

19. Improving the Precision of the Structure–Function Relationship by Considering Phylogenetic Context

20. [Untitled]

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

20 results on '"Boris E, Shakhnovich"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources