20 results on '"Boris E, Shakhnovich"'
Search Results
2. Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites.
- Author
-
Timothy E. Reddy, Charles DeLisi, and Boris E. Shakhnovich
- Published
- 2007
- Full Text
- View/download PDF
3. A First-Principles Model of Early Evolution: Emergence of Gene Families, Species, and Preferred Protein Folds.
- Author
-
Konstantin B. Zeldovich, Peiqiu Chen, Boris E. Shakhnovich, and Eugene I. Shakhnovich
- Published
- 2007
- Full Text
- View/download PDF
4. Correction: Improving the Precision of the Structure-Function Relationship by Considering Phylogenetic Context.
- Author
-
Boris E. Shakhnovich
- Published
- 2005
- Full Text
- View/download PDF
5. Improving the Precision of the Structure-Function Relationship by Considering Phylogenetic Context.
- Author
-
Boris E. Shakhnovich
- Published
- 2005
- Full Text
- View/download PDF
6. ELISA: Structure-Function Inferences based on statistically significant and evolutionarily inspired observations.
- Author
-
Boris E. Shakhnovich, John M. Harvey, Steve Comeau, David Lorenz, Charles DeLisi, and Eugene I. Shakhnovich
- Published
- 2003
- Full Text
- View/download PDF
7. Improvisation in evolution of genes and genomes: whose structure is it anyway?
- Author
-
Eugene I. Shakhnovich and Boris E Shakhnovich
- Subjects
Structure (mathematical logic) ,Genetics ,Genome ,Proteins ,Biology ,Protein structure prediction ,Biological Evolution ,Article ,Transcriptome ,Structural Biology ,Evolutionary biology ,Gene Duplication ,Gene duplication ,Molecular Biology ,Gene ,Organism ,Sequence (medicine) - Abstract
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
- Published
- 2008
- Full Text
- View/download PDF
8. Defining functional distance using manifold embeddings of gene ontology annotations
- Author
-
Gilad Lerman and Boris E. Shakhnovich
- Subjects
Sequence ,Multidisciplinary ,Theoretical computer science ,Similarity (geometry) ,Models, Genetic ,Sequence Homology, Amino Acid ,Function space ,Proteins ,Context (language use) ,Function (mathematics) ,Expression (computer science) ,Biology ,Bioinformatics ,Manifold ,Protein Structure, Tertiary ,Evolution, Molecular ,Structure-Activity Relationship ,Kernel method ,Sequence Homology, Nucleic Acid ,Physical Sciences - Abstract
Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities. Finally, we show that manual classification of structures into folds and superfamilies is mirrored by proximity in the newly defined function space. We show how functional distances place structure–function relationships in biological context resulting in insight into divergent and convergent evolution. The methods and results in this paper can be readily generalized and applied to a wide array of biologically relevant investigations, such as accuracy of annotation transference, the relationship between sequence, structure, and function, or coherence of expression modules.
- Published
- 2007
- Full Text
- View/download PDF
9. Origins and impact of constraints in evolution of gene families
- Author
-
Boris E. Shakhnovich and Eugene V. Koonin
- Subjects
Letter ,Pseudogene ,Genes, Fungal ,Gene Expression ,Saccharomyces cerevisiae ,Biology ,Evolution, Molecular ,Negative selection ,Species Specificity ,Gene Duplication ,Databases, Genetic ,Gene duplication ,Escherichia coli ,Genetics ,Animals ,Gene family ,Selection, Genetic ,Caenorhabditis elegans ,Genes, Helminth ,Genetics (clinical) ,Genes, Essential ,ROC Curve ,Genes, Bacterial ,Essential gene ,Multigene Family ,Subfunctionalization ,Neofunctionalization ,Pseudogenes ,Functional divergence ,Transcription Factors - Abstract
Recent investigations of high-throughput genomic and phenomic data have uncovered a variety of significant but relatively weak correlations between a gene’s functional and evolutionary characteristics. In particular, essential genes and genes with paralogs have a slight propensity to evolve more slowly than nonessential genes and singletons, respectively. However, given the weakness and multiplicity of these associations, their biological relevance remains uncertain. Here, we show that existence of an essential paralog can be used as a specific and strong gauge of selection. We partition gene families in several genomes into two classes: those that include at least one essential gene (E-families) and those without essential genes (N-families). We find that weaker purifying selection causes N-families to evolve in a more dynamic regime with higher rates both of duplicate fixation and pseudogenization. Because genes in E-families are subject to significantly stronger purifying selection than those in N-families, they survive longer and exhibit greater sequence divergence. Longer average survival time also allows for divergence of upstream regulatory regions, resulting in change of transcriptional context among paralogs in E-families. These findings are compatible with differential division of ancestral functions (subfunctionalization) or emergence of novel functions (neofunctionalization) being the prevalent modes of evolution of paralogs in E-families as opposed to pseudogenization (nonfunctionalization), which is the typical fate of paralogs in N-families. Unlike other characteristics of genes, such as essentiality, existence of paralogs, or expression level, membership in an E-family or an N-family strongly correlates with the level of selection and appears to be a major determinant of a gene’s evolutionary fate.
- Published
- 2006
- Full Text
- View/download PDF
10. Protein structure and evolutionary history determine sequence space topology
- Author
-
Eugene I. Shakhnovich, Eric J. Deeds, Charles DeLisi, and Boris E. Shakhnovich
- Subjects
Models, Molecular ,Individual gene ,Pseudogene ,Molecular Sequence Data ,Biology ,Evolution, Molecular ,Protein structure ,Gene duplication ,Genetics ,Gene family ,Quantitative Biology - Genomics ,Letters ,Amino Acid Sequence ,Gene ,Peptide sequence ,Genetics (clinical) ,Coevolution ,Genomics (q-bio.GN) ,Models, Genetic ,Sequence Homology, Amino Acid ,Proteins ,Biomolecules (q-bio.BM) ,Quantitative Biology - Biomolecules ,Evolutionary biology ,FOS: Biological sciences ,Mutation ,Thermodynamics - Abstract
Understanding the observed variability in the number of homologs of a gene is a very important unsolved problem that has broad implications for research into coevolution of structure and function, gene duplication, pseudogene formation, and possibly for emerging diseases. Here, we attempt to define and elucidate some possible causes behind the observed irregularity in sequence space. We present evidence that sequence variability and functional diversity of a gene or fold family is influenced by quantifiable characteristics of the protein structure. These characteristics reflect the structural potential for sequence plasticity, i.e., the ability to accept mutation without losing thermodynamic stability. We identify a structural feature of a protein domain—contact density—that serves as a determinant of entropy in sequence space, i.e., the ability of a protein to accept mutations without destroying the fold (also known as fold designability). We show that (log) of average gene family size exhibits statistical correlation (R2 > 0.9.) with contact density of its three-dimensional structure. We present evidence that the size of individual gene families are influenced not only by the designability of the structure, but also by evolutionary history, e.g., the amount of time the gene family was in existence. We further show that our observed statistical correlation between gene family size and contact density of the structure is valid on many levels of evolutionary divergence, i.e., not only for closely related sequence, but also for less-related fold and superfamily levels of homology.
- Published
- 2005
- Full Text
- View/download PDF
11. Imprint of evolution on protein structures
- Author
-
Guido Tiana, Eugene I. Shakhnovich, Nikolay V. Dokholyan, and Boris E. Shakhnovich
- Subjects
Genetics ,Multidisciplinary ,Protein domain ,Evolutionary algorithm ,Proteins ,Computational biology ,Biological Sciences ,Biology ,Models, Biological ,Evolution, Molecular ,Divergent evolution ,Protein structure ,Lattice (order) ,Gene duplication ,Thermodynamics ,Evolutionary dynamics ,Gene ,Algorithms - Abstract
We attempt to understand the evolutionary origin of protein folds by simulating their divergent evolution with a three-dimensional lattice model. Starting from an initial seed lattice structure, evolution of model proteins progresses by sequence duplication and subsequent point mutations. A new gene's ability to fold into a stable and unique structure is tested each time through direct kinetic folding simulations. Where possible, the algorithm accepts the new sequence and structure and thus a “new protein structure” is born. During the course of each run, this model evolutionary algorithm provides several thousand new proteins with diverse structures. Analysis of evolved structures shows that later evolved structures are more designable than seed structures as judged by recently developed structural determinant of protein designability, as well as direct estimate of designability for selected structures by thermodynamic sampling of their sequence space. We test the significance of this trend predicted on lattice models on real proteins and show that protein domains that are found in eukaryotic organisms only feature statistically significant higher designability than their prokaryotic counterparts. These results present a fundamental view on protein evolution highlighting the relative roles of structural selection and evolutionary dynamics on genesis of modern proteins.
- Published
- 2004
- Full Text
- View/download PDF
12. Natural selection of more designable folds: A mechanism for thermophilic adaptation
- Author
-
Eugene I. Shakhnovich, Jeremy L. England, and Boris E. Shakhnovich
- Subjects
Protein Folding ,Multidisciplinary ,Natural selection ,Proteome ,Ecology ,Thermophile ,Biophysics ,Temperature ,Proteins ,Bacterial genome size ,Fold (geology) ,Biological Sciences ,Biology ,Adaptation, Physiological ,Biophysical Phenomena ,Quantitative measure ,Evolutionary biology ,Thermodynamics ,Protein folding ,Selection, Genetic - Abstract
An open question of great interest in biophysics is whether variations in structure cause protein folds to differ in the number of amino acid sequences that can fold to them stably, i.e., in their designability. Recently, we have shown that a novel quantitative measure of a fold's tertiary topology, called its contact trace, strongly correlates with the fold's designability. Here, we investigate the relationship between a fold's contact trace and its relative frequency of usage in mesophilic vs. thermophilic eubacteria. We observe that thermophilic organisms exhibit a bias toward using folds of higher contact trace when compared with mesophiles. We establish this difference both for the distributions of folds at the whole-proteome level and also through more focused structural comparisons of orthologous proteins. Our findings suggest that thermophilic adaptation in bacterial genomes occurs in part through natural selection of more designable folds, pointing to designability as a key component of protein fitness.
- Published
- 2003
- Full Text
- View/download PDF
13. Functional Fingerprints of Folds: Evidence for Correlated Structure–Function Evolution
- Author
-
Eugene I. Shakhnovich, Nikolay V. Dokholyan, Boris E. Shakhnovich, and Charles DeLisi
- Subjects
Proteomics ,Protein Folding ,Structural similarity ,Protein domain ,Computational biology ,Biology ,Bioinformatics ,Domain (software engineering) ,Evolution, Molecular ,Structure-Activity Relationship ,Adenosine Triphosphate ,Structural Biology ,Cluster (physics) ,Databases, Protein ,Divergence (statistics) ,Cluster analysis ,Molecular Biology ,Computational Biology ,Proteins ,Enzymes ,Protein Structure, Tertiary ,Divergent evolution ,Graph (abstract data type) ,Guanosine Triphosphate ,Protein Binding ,Signal Transduction - Abstract
Using structural similarity clustering of protein domains: protein domain universe graph (PDUG), and a hierarchical functional annotation: gene ontology (GO) as two evolutionary lenses, we find that each structural cluster (domain fold) exhibits a distribution of functions that is unique to it. These functional distributions are functional fingerprints that are specific to characteristic structural clusters and vary from cluster to cluster. Furthermore, as structural similarity threshold for domain clustering in the PDUG is relaxed we observe an influx of earlier-diverged domains into clusters. These domains join clusters without destroying the functional fingerprint. These results can be understood in light of a divergent evolution scenario that posits correlated divergence of structural and functional traits in protein domains from one or few progenitors.
- Published
- 2003
- Full Text
- View/download PDF
14. Structural similarity enhances interaction propensity of proteins
- Author
-
Eugene I. Shakhnovich, David B. Lukatsky, Boris E. Shakhnovich, and Julian Mintseris
- Subjects
Models, Molecular ,Structural similarity ,Computational biology ,Plasma protein binding ,Biology ,DNA-binding protein ,Article ,Protein–protein interaction ,Evolution, Molecular ,Structure-Activity Relationship ,Bacterial Proteins ,Structural Biology ,Structure–activity relationship ,Amino Acids ,Databases, Protein ,Molecular Biology ,chemistry.chemical_classification ,Proteins ,Biomolecules (q-bio.BM) ,Evolutionary pressure ,Amino acid ,DNA-Binding Proteins ,chemistry ,Biochemistry ,Quantitative Biology - Biomolecules ,Structural Homology, Protein ,FOS: Biological sciences ,Statistical Prevalence ,Dimerization ,Protein Binding - Abstract
We study statistical properties of interacting protein-like surfaces and predict two strong, related effects: (i) statistically enhanced self-attraction of proteins; (ii) statistically enhanced attraction of proteins with similar structures. The effects originate in the fact that the probability to find a pattern self-match between two identical, even randomly organized interacting protein surfaces is always higher compared with the probability for a pattern match between two different, promiscuous protein surfaces. This theoretical finding explains statistical prevalence of homodimers in protein-protein interaction networks reported earlier. Further, our findings are confirmed by the analysis of curated database of protein complexes that showed highly statistically significant overrepresentation of dimers formed by structurally similar proteins with highly divergent sequences ("superfamily heterodimers"). We suggest that promiscuous homodimeric interactions pose strong competitive interactions for heterodimers evolved from homodimers. Such evolutionary bottleneck is overcome using the negative design evolutionary pressure applied against promiscuous homodimer formation. This is achieved through the formation of highly specific contacts formed by charged residues as demonstrated both in model and real superfamily heterodimers.
- Published
- 2006
15. Assessing transcription factor motif drift from noisy decoy sequences
- Author
-
Timothy E, Reddy, Charles, DeLisi, and Boris E, Shakhnovich
- Subjects
DNA-Binding Proteins ,Chromatin Immunoprecipitation ,Binding Sites ,Base Sequence ,Computational Biology ,DNA ,Saccharomyces cerevisiae ,Genome, Fungal ,Microarray Analysis ,Promoter Regions, Genetic ,Algorithms ,Protein Binding ,Transcription Factors - Abstract
Genome scale identification of transcription factor binding sites (TFBS) is fundamental to understanding the complexities of mRNA expression at both the cell and organismal levels. While high-throughput experimental methods provide associations between transcription factors and the genes they regulate under a specified experimental condition, computational methods are still required to pinpoint the exact location of binding. Moreover, since the binding site is an intrinsic property of the promoter region, computational methods are in principle more general than condition dependent experimental methods. Computational identification of TFBSs is complicated in at least two different ways. First, transcription factors bind a heterogeneous distribution of sites and therefore have a distribution of affinities. Second, the set of sequences for which a common site is to be determined do not all have a site for the TF of interest. In this paper, we evaluate the robustness of TFBS identification with respect to both effects. We show addition of upstream regions that do not have the TFBS destroy the specificity of the predicted binding site. We also propose a method to calculate the distance between position weight matrices that can be used to measure "drift'' from the canonical binding site. The results presented here could be useful in developing future transcription factor binding site identification algorithms.
- Published
- 2005
16. ELISA: a unified, multidimensional view of the protein domain universe
- Author
-
Boris E, Shakhnovich, John Max, Harvey, and Charles, Delisi
- Subjects
Evolution, Molecular ,Models, Genetic ,Molecular Sequence Data ,Proteins ,Amino Acid Sequence ,Databases, Protein ,Phylogeny - Abstract
ELISA (http://romi.bu.edu/elisa/) is a database that was designed for flexibility in defining interesting queries about protein domain evolution. We have defined and included both the inherent characteristics of the domains such as structure and function and comparisons of these characteristics between domains. Thus, the database is useful in defining structural and functional links between related protein domains and by extension sequences that encode them. In this database we introduce and employ a novel method of functional annotation and comparison. For each protein domain we create a probabilistic functional annotation tree using GO. We have designed an algorithm that accurately compares these trees and thus provides a measure of "functional distance" between two protein domains. Along with functional annotation, we have also included structural comparison between protein domains and best sequence comparisons to all known genomes. The latter enables researchers to dynamically do searches for domains sharing similar phylogenetic profiles. This combination of data and tools enables the researcher to design complex queries to carry out research in the areas of protein domain evolution, structure prediction and functional annotation of novel sequences.
- Published
- 2005
17. Comparisons of predicted genetic modules: identification of co-expressed genes through module gene flow
- Author
-
Boris E, Shakhnovich, Timothy E, Reddy, Kevin, Galinsky, Joseph, Mellor, and Charles, Delisi
- Subjects
Genes ,Models, Genetic ,Gene Expression Profiling ,Databases, Genetic ,Sequence Analysis, DNA ,Algorithms - Abstract
A question of fundamental importance is the definition and identification of modules from microarray experiments. A wide variety of techniques have been used to gain insight into the elucidation of such modules. One problem, however, is the inability to directly compare results between the different data sets produced due to the inherent parameterizations of their approaches. We first aim to provide a mechanism by which different approaches to module finding can be directly compared. Moreover, the same approach can be used to internally compare the modules predicted by the same technique, but at different parameterizations. We apply this approach to analyze the flow of genes through modules at different module thresholds of the Barkai Signature method, thereby further resolving the modules into sets of co-expressed genes.
- Published
- 2005
18. Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation
- Author
-
J. Max Harvey and Boris E. Shakhnovich
- Subjects
Theoretical computer science ,Function space ,Computer science ,Biochemical Phenomena ,Protein domain ,Graph theory ,Bioinformatics ,Structural genomics ,Annotation ,Structure-Activity Relationship ,Corollary ,Protein Annotation ,Structural Biology ,Terminology as Topic ,Computer Graphics ,Entropy (information theory) ,Molecular Biology - Abstract
Since the advent of investigations into structural genomics, research has focused on correctly identifying domain boundaries, as well as domain similarities and differences in the context of their evolutionary relationships. As the science of structural genomics ramps up adding more and more information into the databanks, questions about the accuracy and completeness of our classification and annotation systems appear on the forefront of this research. A central question of paramount importance is how structural similarity relates to functional similarity. Here, we begin to rigorously and quantitatively answer these questions by first exploring the consensus between the most common protein domain structure annotation databases CATH, SCOP and FSSP. Each of these databases explores the evolutionary relationships between protein domains using a combination of automatic and manual, structural and functional, continuous and discrete similarity measures. In order to examine the issue of consensus thoroughly, we build a generalized graph out of each of these databases and hierarchically cluster these graphs at interval thresholds. We then employ a distance measure to find regions of greatest overlap. Using this procedure we were able not only to enumerate the level of consensus between the different annotation systems, but also to define the graph-theoretical origins behind the annotation schema of class, family and superfamily by observing that the same thresholds that define the best consensus regions between FSSP, SCOP and CATH correspond to distinct, non-random phase-transitions in the structure comparison graph itself. To investigate the correspondence in divergence between structure and function further, we introduce a measure of functional entropy that calculates divergence in function space. First, we use this measure to calculate the general correlation between structural homology and functional proximity. We extend this analysis further by quantitatively calculating the average amount of functional information gained from our understanding of structural distance and the corollary inherent uncertainty that represents the theoretical limit of our ability to infer function from structural similarity. Finally we show how our measure of functional "entropy" translates into a more intuitive concept of functional annotation into similarity EC classes.
- Published
- 2003
19. Expanding protein universe and its origin from the biological Big Bang
- Author
-
Eugene I. Shakhnovich, Nikolay V. Dokholyan, and Boris E. Shakhnovich
- Subjects
Big Bang ,Models, Molecular ,Multidisciplinary ,media_common.quotation_subject ,Proteins ,Biology ,Biological Sciences ,Universe ,Protein Structure, Tertiary ,Evolution, Molecular ,Molecular evolution ,Abiogenesis ,Evolutionary biology ,Computer Simulation ,media_common - Abstract
The bottom-up approach to understanding the evolution of organisms is by studying molecular evolution. With the large number of protein structures identified in the past decades, we have discovered peculiar patterns that nature imprints on protein structural space in the course of evolution. In particular, we have discovered that the universe of protein structures is organized hierarchically into a scale-free network. By understanding the cause of these patterns, we attempt to glance at the very origin of life.
- Published
- 2002
20. Positional clustering improves computational binding site detection and identifies novel cis -regulatory sites in mammalian GABA A receptor subunit genes
- Author
-
Shelley J. Russek, Charles DeLisi, Boris E. Shakhnovich, Timothy E. Reddy, and Daniel S. Roberts
- Subjects
Protein subunit ,Saccharomyces cerevisiae ,Biology ,gamma-Aminobutyric acid ,03 medical and health sciences ,Mice ,0302 clinical medicine ,Neurotransmitter receptor ,Genetics ,medicine ,Transcriptional regulation ,Animals ,Cluster Analysis ,Binding site ,Promoter Regions, Genetic ,Gene ,Transcription factor ,Cells, Cultured ,030304 developmental biology ,Neurons ,0303 health sciences ,Binding Sites ,GABAA receptor ,Computational Biology ,Sequence Analysis, DNA ,Receptors, GABA-A ,Rats ,Protein Subunits ,Methods Online ,030217 neurology & neurosurgery ,Algorithms ,medicine.drug ,Transcription Factors - Abstract
Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming. We introduce a simple strategy that dramatically improves robustness and accuracy of computational binding site prediction. First, we evaluate the rate of recurrence of computational TFBS predictions by commonly used sampling procedures. We find that the vast majority of results are biologically meaningless. However clustering results based on nucleotide position improves predictive power. Additionally, we find that positional clustering increases robustness to long or imperfectly selected input sequences. Positional clustering can also be used as a mechanism to integrate results from multiple sampling approaches for improvements in accuracy over each one alone. Finally, we predict and validate regulatory sequences partially responsible for transcriptional control of the mammalian type A gamma-aminobutyric acid receptor (GABA(A)R) subunit genes. Positional clustering is useful for improving computational binding site predictions, with potential application to improving our understanding of mammalian gene expression. In particular, predicted regulatory mechanisms in the mammalian GABA(A)R subunit gene family may open new avenues of research towards understanding this pharmacologically important neurotransmitter receptor system.
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.