Author: "Patricia C. Babbitt" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Patricia C. Babbitt"' showing total 189 results

Start Over Author "Patricia C. Babbitt"

189 results on '"Patricia C. Babbitt"'

1. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Author: Naihui Zhou, Yuxiang Jiang, Timothy R. Bergquist, Alexandra J. Lee, Balint Z. Kacsoh, Alex W. Crocker, Kimberley A. Lewis, George Georghiou, Huy N. Nguyen, Md Nafiz Hamid, Larry Davis, Tunca Dogan, Volkan Atalay, Ahmet S. Rifaioglu, Alperen Dalkıran, Rengul Cetin Atalay, Chengxin Zhang, Rebecca L. Hurto, Peter L. Freddolino, Yang Zhang, Prajwal Bhat, Fran Supek, José M. Fernández, Branislava Gemovic, Vladimir R. Perovic, Radoslav S. Davidović, Neven Sumonja, Nevena Veljkovic, Ehsaneddin Asgari, Mohammad R.K. Mofrad, Giuseppe Profiti, Castrense Savojardo, Pier Luigi Martelli, Rita Casadio, Florian Boecker, Heiko Schoof, Indika Kahanda, Natalie Thurlby, Alice C. McHardy, Alexandre Renaux, Rabie Saidi, Julian Gough, Alex A. Freitas, Magdalena Antczak, Fabio Fabris, Mark N. Wass, Jie Hou, Jianlin Cheng, Zheng Wang, Alfonso E. Romero, Alberto Paccanaro, Haixuan Yang, Tatyana Goldberg, Chenguang Zhao, Liisa Holm, Petri Törönen, Alan J. Medlar, Elaine Zosa, Itamar Borukhov, Ilya Novikov, Angela Wilkins, Olivier Lichtarge, Po-Han Chi, Wei-Cheng Tseng, Michal Linial, Peter W. Rose, Christophe Dessimoz, Vedrana Vidulin, Saso Dzeroski, Ian Sillitoe, Sayoni Das, Jonathan Gill Lees, David T. Jones, Cen Wan, Domenico Cozzetto, Rui Fa, Mateo Torres, Alex Warwick Vesztrocy, Jose Manuel Rodriguez, Michael L. Tress, Marco Frasca, Marco Notaro, Giuliano Grossi, Alessandro Petrini, Matteo Re, Giorgio Valentini, Marco Mesiti, Daniel B. Roche, Jonas Reeb, David W. Ritchie, Sabeur Aridhi, Seyed Ziaeddin Alborzi, Marie-Dominique Devignes, Da Chen Emily Koo, Richard Bonneau, Vladimir Gligorijević, Meet Barot, Hai Fang, Stefano Toppo, Enrico Lavezzo, Marco Falda, Michele Berselli, Silvio C.E. Tosatto, Marco Carraro, Damiano Piovesan, Hafeez Ur Rehman, Qizhong Mao, Shanshan Zhang, Slobodan Vucetic, Gage S. Black, Dane Jo, Erica Suh, Jonathan B. Dayton, Dallas J. Larsen, Ashton R. Omdahl, Liam J. McGuffin, Danielle A. Brackenridge, Patricia C. Babbitt, Jeffrey M. Yunes, Paolo Fontana, Feng Zhang, Shanfeng Zhu, Ronghui You, Zihan Zhang, Suyang Dai, Shuwei Yao, Weidong Tian, Renzhi Cao, Caleb Chandler, Miguel Amezola, Devon Johnson, Jia-Ming Chang, Wen-Hung Liao, Yi-Wei Liu, Stefano Pascarelli, Yotam Frank, Robert Hoehndorf, Maxat Kulmanov, Imane Boudellioua, Gianfranco Politano, Stefano Di Carlo, Alfredo Benso, Kai Hakala, Filip Ginter, Farrokh Mehryary, Suwisa Kaewphan, Jari Björne, Hans Moen, Martti E.E. Tolvanen, Tapio Salakoski, Daisuke Kihara, Aashish Jain, Tomislav Šmuc, Adrian Altenhoff, Asa Ben-Hur, Burkhard Rost, Steven E. Brenner, Christine A. Orengo, Constance J. Jeffery, Giovanni Bosco, Deborah A. Hogan, Maria J. Martin, Claire O’Donovan, Sean D. Mooney, Casey S. Greene, Predrag Radivojac, and Iddo Friedberg
Subjects: Protein function prediction, Long-term memory, Biofilm, Critical assessment, Community challenge, Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Published: 2019
Full Text: View/download PDF

2. New computational approaches to understanding molecular protein function.

Author: Jacquelyn S Fetrow and Patricia C Babbitt
Subjects: Biology (General), QH301-705.5
Published: 2018
Full Text: View/download PDF

3. Kinetic and Structural Analysis of Two Linkers in the Tautomerase Superfamily: Analysis and Implications

Author: Christian P. Whitman, Bert-Jan Baas, Tamer S. Kaoud, Patricia C. Babbitt, Kaci Erwin, R. Yvette Moreno, Jake LeVieux, Marieke de Ruijter, Yan Jessie Zhang, William H. Johnson, Emily B. Lancaster, and Brenda P. Medellin
Subjects: Magnetic Resonance Spectroscopy, Arginine, Hydrolases, Stereochemistry, Biochemistry, Article, Catalysis, Evolution, Molecular, 03 medical and health sciences, Residue (chemistry), Catalytic Domain, Gene duplication, Amino Acid Sequence, Isomerases, Gene, Sequence (medicine), Dehalogenase, chemistry.chemical_classification, 0303 health sciences, Binding Sites, biology, Chemistry, 030302 biochemistry & molecular biology, Active site, Kinetics, Enzyme, Models, Chemical, biology.protein
Abstract: The tautomerase superfamily (TSF) is a collection of enzymes and proteins that share a simple β-α-β structural scaffold. Most members are constructed from a single-core β-α-β motif or two consecutively fused β-α-β motifs in which the N-terminal proline (Pro-1) plays a key and unusual role as a catalytic residue. The cumulative evidence suggests that a gene fusion event took place in the evolution of the TSF followed by duplication (of the newly fused gene) to result in the diversification of activity that is seen today. Analysis of the sequence similarity network (SSN) for the TSF identified several linking proteins ("linkers") whose similarity links subgroups of these contemporary proteins that might hold clues about structure-function relationship changes accompanying the emergence of new activities. A previously uncharacterized pair of linkers (designated N1 and N2) was identified in the SSN that connected the 4-oxalocrotonate tautomerase (4-OT) and cis-3-chloroacrylic acid dehalogenase (cis-CaaD) subgroups. N1, in the cis-CaaD subgroup, has the full complement of active site residues for cis-CaaD activity, whereas N2, in the 4-OT subgroup, lacks a key arginine (Arg-39) for canonical 4-OT activity. Kinetic characterization and nuclear magnetic resonance analysis show that N1 has activities observed for other characterized members of the cis-CaaD subgroup with varying degrees of efficiencies. N2 is a modest 4-OT but shows enhanced hydratase activity using allene and acetylene compounds, which might be due to the presence of Arg-8 along with Arg-11. Crystallographic analysis provides a structural context for these observations.
Published: 2021

4. Informatics Approaches in Structural Genomics - Session Introduction.

Author: Sean D. Mooney and Patricia C. Babbitt
Published: 2003

5. ViewFeature: Integrated Feature Analysis and Visualization.

Author: D. Rey Banatao, Conrad C. Huang, Patricia C. Babbitt, Russ B. Altman, and Teri E. Klein
Published: 2001

6. Structural Basis for the Asymmetry of a 4-Oxalocrotonate Tautomerase Trimer

Author: Yan Jessie Zhang, Patricia C. Babbitt, Swanand Rakhade, Emily B. Lancaster, Christian P. Whitman, Shoshana D. Brown, and Brenda P. Medellin
Subjects: Bordetella, Burkholderia, Stereochemistry, Trimer, Sequence alignment, Biochemistry, Oligomer, Article, 03 medical and health sciences, chemistry.chemical_compound, Protein structure, Bacterial Proteins, Cluster (physics), Amino Acid Sequence, Isomerases, Protein Structure, Quaternary, Peptide sequence, 0303 health sciences, Binding Sites, Burkholderiaceae, 030302 biochemistry & molecular biology, Computational Biology, Kinetics, Monomer, chemistry, 4-Oxalocrotonate tautomerase, Sequence Alignment, Alcaligenaceae
Abstract: Tautomerase superfamily (TSF) members are constructed from a single β−α−β unit or two consecutively joined β−α−β units. This pattern prevails throughout the superfamily consisting of more than 11,000 members where homo- or heterohexamers are localized in the 4-oxalocrotonate tautomerase (4-OT) subgroup and trimers are found in the other four subgroups. One exception is a subset of sequences that are double the length of the short 4-OTs in the 4-OT subgroup, where the coded proteins form trimers. Characterization of two members revealed an interesting dichotomy. One is a symmetric trimer, whereas the other one is an asymmetric trimer. One monomer is flipped 180° relative to the other two monomers so that three unique protein-protein interfaces are created that are composed of different residues. A bioinformatics analysis of the fused 4-OT subset shows a further division into two clusters with a total of 133 sequences. The analysis showed that members of one cluster (86 sequences) have more salt bridges if the asymmetric trimer forms, whereas the members of the other cluster (47 sequences) have more salt bridges if the symmetric trimer forms. This hypothesis was examined by the kinetic and structural characterization of two proteins within each cluster. As predicted, all four proteins function as 4-OTs, where two assemble into asymmetric trimers (designated R7 and F6) and two form symmetric trimers (designated W0 and Q0). These findings can be extended to the other sequences in the two clusters in the fused 4-OT subset, thereby annotating their oligomer properties and activities.
Published: 2020

7. Session Introduction.

Author: Sean D. Mooney, Philip E. Bourne, and Patricia C. Babbitt
Published: 2004

8. Representing Structure-Function Relationships in Mechanistically Diverse Enzyme Superfamilies.

Author: Scott C.-H. Pegg, Shoshana D. Brown, Sunil Ojha, Conrad C. Huang, Thomas E. Ferrin, and Patricia C. Babbitt
Published: 2005

9. Introduction to Informatics Approaches in Structural Genomics: Modeling and Representation of Function from Macromolecular STructure.

Author: Patricia C. Babbitt, Philip E. Bourne, and Sean D. Mooney
Published: 2005

10. Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks

Author: Suwen Zhao, Ayano Sakai, Xinshuai Zhang, Matthew W Vetting, Ritesh Kumar, Brandan Hillerich, Brian San Francisco, Jose Solbiati, Adam Steves, Shoshana Brown, Eyal Akiva, Alan Barber, Ronald D Seidel, Patricia C Babbitt, Steven C Almo, John A Gerlt, and Matthew P Jacobson
Subjects: sequence similarity network, genome neighborhood network, functional assignment, Medicine, Science, Biology (General), QH301-705.5
Abstract: Metabolic pathways in eubacteria and archaea often are encoded by operons and/or gene clusters (genome neighborhoods) that provide important clues for assignment of both enzyme functions and metabolic pathways. We describe a bioinformatic approach (genome neighborhood network; GNN) that enables large scale prediction of the in vitro enzymatic activities and in vivo physiological functions (metabolic pathways) of uncharacterized enzymes in protein families. We demonstrate the utility of the GNN approach by predicting in vitro activities and in vivo functions in the proline racemase superfamily (PRS; InterPro IPR008794). The predictions were verified by measuring in vitro activities for 51 proteins in 12 families in the PRS that represent ~85% of the sequences; in vitro activities of pathway enzymes, carbon/nitrogen source phenotypes, and/or transcriptomic studies confirmed the predicted pathways. The synergistic use of sequence similarity networks3 and GNNs will facilitate the discovery of the components of novel, uncharacterized metabolic pathways in sequenced genomes.
Published: 2014
Full Text: View/download PDF

11. Large-scale determination of sequence, structure, and function relationships in cytosolic glutathione transferases across the biosphere.

Author: Susan T Mashiyama, M Merced Malabanan, Eyal Akiva, Rahul Bhosle, Megan C Branch, Brandan Hillerich, Kevin Jagessar, Jungwook Kim, Yury Patskovsky, Ronald D Seidel, Mark Stead, Rafael Toro, Matthew W Vetting, Steven C Almo, Richard N Armstrong, and Patricia C Babbitt
Subjects: Biology (General), QH301-705.5
Abstract: The cytosolic glutathione transferase (cytGST) superfamily comprises more than 13,000 nonredundant sequences found throughout the biosphere. Their key roles in metabolism and defense against oxidative damage have led to thousands of studies over several decades. Despite this attention, little is known about the physiological reactions they catalyze and most of the substrates used to assay cytGSTs are synthetic compounds. A deeper understanding of relationships across the superfamily could provide new clues about their functions. To establish a foundation for expanded classification of cytGSTs, we generated similarity-based subgroupings for the entire superfamily. Using the resulting sequence similarity networks, we chose targets that broadly covered unknown functions and report here experimental results confirming GST-like activity for 82 of them, along with 37 new 3D structures determined for 27 targets. These new data, along with experimentally known GST reactions and structures reported in the literature, were painted onto the networks to generate a global view of their sequence-structure-function relationships. The results show how proteins of both known and unknown function relate to each other across the entire superfamily and reveal that the great majority of cytGSTs have not been experimentally characterized or annotated by canonical class. A mapping of taxonomic classes across the superfamily indicates that many taxa are represented in each subgroup and highlights challenges for classification of superfamily sequences into functionally relevant classes. Experimental determination of disulfide bond reductase activity in many diverse subgroups illustrate a theme common for many reaction types. Finally, sequence comparison between an enzyme that catalyzes a reductive dechlorination reaction relevant to bioremediation efforts with some of its closest homologs reveals differences among them likely to be associated with evolution of this unusual reaction. Interactive versions of the networks, associated with functional and other types of information, can be downloaded from the Structure-Function Linkage Database (SFLD; http://sfld.rbvi.ucsf.edu).
Published: 2014
Full Text: View/download PDF

12. Effusion: prediction of protein function from sequence similarity networks

Author: Patricia C. Babbitt and Jeffrey M. Yunes
Subjects: Statistics and Probability, 0303 health sciences, Protein function, Computer science, 030302 biochemistry & molecular biology, Computational Biology, Proteins, computer.software_genre, Original Papers, Biochemistry, Computer Science Applications, 03 medical and health sciences, Computational Mathematics, Gene Ontology, Computational Theory and Mathematics, Effusion, Protein function prediction, Data mining, Primary sequence, Sequence Analysis, Molecular Biology, computer, Software, 030304 developmental biology
Abstract: Motivation Critical evaluation of methods for protein function prediction shows that data integration improves the performance of methods that predict protein function, but a basic BLAST-based method is still a top contender. We sought to engineer a method that modernizes the classical approach while avoiding pitfalls common to state-of-the-art methods. Results We present a method for predicting protein function, Effusion, which uses a sequence similarity network to add context for homology transfer, a probabilistic model to account for the uncertainty in labels and function propagation, and the structure of the Gene Ontology (GO) to best utilize sparse input labels and make consistent output predictions. Effusion’s model makes it practical to integrate rare experimental data and abundant primary sequence and sequence similarity. We demonstrate Effusion’s performance using a critical evaluation method and provide an in-depth analysis. We also dissect the design decisions we used to address challenges for predicting protein function. Finally, we propose directions in which the framework of the method can be modified for additional predictive power. Availability and implementation The source code for an implementation of Effusion is freely available at https://github.com/babbittlab/effusion. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2018

13. Kinetic and structural characterization of a cis -3-Chloroacrylic acid dehalogenase homologue in Pseudomonas sp. UW4: A potential step between subgroups in the tautomerase superfamily

Author: Rebecca Davidson, Patricia C. Babbitt, Tamer S. Kaoud, Christian P. Whitman, Bert-Jan Baas, Jake LeVieux, and Yan Jessie Zhang
Subjects: 0301 basic medicine, Hydrolases, Stereochemistry, Biophysics, Crystallography, X-Ray, Biochemistry, Article, 03 medical and health sciences, Bacterial Proteins, Protein Domains, Pseudomonas, Hydrolase, Molecular Biology, Dehalogenase, chemistry.chemical_classification, 030102 biochemistry & molecular biology, biology, Active site, biology.organism_classification, Amino acid, Kinetics, 030104 developmental biology, Enzyme, chemistry, biology.protein, UniProt, Function (biology)
Abstract: A Pseudomonas sp. UW4 protein (UniProt K9NIA5) of unknown function was identified as similar to 4-oxalocrotonate tautomerase (4-OT)-like and cis-3-chloroacrylic acid dehalogenase (cis-CaaD)-like subgroups of the tautomerase superfamily (TSF). This protein lacks only Tyr-103 of the amino acids critical for cis-CaaD activity (Pro-1, His-28, Arg-70, Arg-73, Tyr-103, Glu-114). As it may represent an important variant of these enzymes, its kinetic and structural properties have been determined. The protein shows tautomerase activity with phenylenolpyruvate, but lacks native 4-OT activity and dehalogenase activity with the isomers of 3-chloroacrylic acid. It shows mostly low-level hydratase activity at pH 7.0, converting 2-oxo-3-pentynoate to acetopyruvate, consistent with cis-CaaD-like behavior. At pH 9.0, this compound results primarily in covalent modification of Pro-1, which is consistent with 4-OT-like behavior. These observations could reflect a pKa for Pro-1 that is closer to that of cis-CaaD (∼9.2) than to 4-OT (∼6.4). A structure of the native enzyme, at 2.6 A resolution, highlights differences at the active site from those of 4-OT and cis-CaaD that add to our understanding of how contemporary TSF reactions and mechanisms may have diverged from a common 4-OT-like ancestor.
Published: 2017

14. An atlas of the thioredoxin fold class reveals the complexity of function-enabling adaptations.

Author: Holly J Atkinson and Patricia C Babbitt
Subjects: Biology (General), QH301-705.5
Abstract: The group of proteins that contain a thioredoxin (Trx) fold is huge and diverse. Assessment of the variation in catalytic machinery of Trx fold proteins is essential in providing a foundation for understanding their functional diversity and predicting the function of the many uncharacterized members of the class. The proteins of the Trx fold class retain common features-including variations on a dithiol CxxC active site motif-that lead to delivery of function. We use protein similarity networks to guide an analysis of how structural and sequence motifs track with catalytic function and taxonomic categories for 4,082 representative sequences spanning the known superfamilies of the Trx fold. Domain structure in the fold class is varied and modular, with 2.8% of sequences containing more than one Trx fold domain. Most member proteins are bacterial. The fold class exhibits many modifications to the CxxC active site motif-only 56.8% of proteins have both cysteines, and no functional groupings have absolute conservation of the expected catalytic motif. Only a small fraction of Trx fold sequences have been functionally characterized. This work provides a global view of the complex distribution of domains and catalytic machinery throughout the fold class, showing that each superfamily contains remnants of the CxxC active site. The unifying context provided by this work can guide the comparison of members of different Trx fold superfamilies to gain insight about their structure-function relationships, illustrated here with the thioredoxins and peroxiredoxins.
Published: 2009
Full Text: View/download PDF

15. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

Author: Holly J Atkinson, John H Morris, Thomas E Ferrin, and Patricia C Babbitt
Subjects: Medicine, Science
Abstract: The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.
Published: 2009
Full Text: View/download PDF

16. A global view of structure–function relationships in the tautomerase superfamily

Author: Patricia C. Babbitt, Jake LeVieux, Eyal Akiva, Yan Jessie Zhang, Rebecca Davidson, Collin R. Pullara, Christian P. Whitman, Bert-Jan Baas, Benjamin J. Polacco, and Gemma L. Holliday
Subjects: 0301 basic medicine, Molecular Sequence Data, structure–function relationships, Computational biology, Biology, Crystallography, X-Ray, Biochemistry, structure-function, tautomerase superfamily, enzyme superfamily, Evolution, Molecular, 03 medical and health sciences, protein sequence, Protein sequencing, Protein structure, Three-domain system, Gene duplication, evolution, Animals, Humans, Amino Acid Sequence, protein structure, protein evolution, Molecular Biology, Phylogeny, Dehalogenase, Sequence (medicine), Binding Sites, 030102 biochemistry & molecular biology, Computational Biology, Eukaryota, Cell Biology, Plants, Enzyme structure, enzyme structure, Enzymes, Kinetics, 030104 developmental biology, Multigene Family, Sequence Alignment, Function (biology)
Abstract: The tautomerase superfamily (TSF) consists of more than 11,000 nonredundant sequences present throughout the biosphere. Characterized members have attracted much attention because of the unusual and key catalytic role of an N-terminal proline. These few characterized members catalyze a diverse range of chemical reactions, but the full scale of their chemical capabilities and biological functions remains unknown. To gain new insight into TSF structure-function relationships, we performed a global analysis of similarities across the entire superfamily and computed a sequence similarity network to guide classification into distinct subgroups. Our results indicate that TSF members are found in all domains of life, with most being present in bacteria. The eukaryotic members of the cis-3-chloroacrylic acid dehalogenase subgroup are limited to fungal species, whereas the macrophage migration inhibitory factor subgroup has wide eukaryotic representation (including mammals). Unexpectedly, we found that 346 TSF sequences lack Pro-1, of which 85% are present in the malonate semialdehyde decarboxylase subgroup. The computed network also enabled the identification of similarity paths, namely sequences that link functionally diverse subgroups and exhibit transitional structural features that may help explain reaction divergence. A structure-guided comparison of these linker proteins identified conserved transitions between them, and kinetic analysis paralleled these observations. Phylogenetic reconstruction of the linker set was consistent with these findings. Our results also suggest that contemporary TSF members may have evolved from a short 4-oxalocrotonate tautomerase-like ancestor followed by gene duplication and fusion. Our new linker-guided strategy can be used to enrich the discovery of sequence/structure/function transitions in other enzyme superfamilies.
Published: 2017

17. Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily

Author: Patricia C. Babbitt, Eyal Akiva, Nobuhiko Tokuriki, and Janine N. Copp
Subjects: 0301 basic medicine, Models, Molecular, Scaffold, Evolution, Flavin Mononucleotide, Computational biology, Biology, nitroreductase, Evolution, Molecular, 03 medical and health sciences, Functional diversity, Nitroreductase, lavoenzyme, Models, Three-domain system, flavoenzyme, evolution, Computational analysis, Phylogeny, Genetics, Multidisciplinary, 030102 biochemistry & molecular biology, Molecular, Computational Biology, SUPERFAMILY, Biological Sciences, Nitroreductases, Phylogenetic reconstruction, Biophysics and Computational Biology, 030104 developmental biology, PNAS Plus, enzyme superfamilies, sequence similarity network, Function (biology)
Abstract: Significance Functionally diverse enzyme superfamilies are sets of homologs that conserve a structural fold and mechanistic details but perform various distinct chemical reactions. What are the evolutionary routes by which ancestral proteins diverge to produce extant enzymes? We present an approach that combines experimental data with computational tools to trace these sequence–structure–function transitions in a model system, the functionally diverse flavin mononucleotide-dependent nitroreductases (NTRs). Our results suggest an evolutionary model in which contemporary NTR classes have diverged in a radial manner from a minimal flavin-binding scaffold via insertions at key positions and fixation of functional residues, yielding the reaction versatility of contemporary enzymes. These principles will facilitate rational design of NTRs and advance general approaches for delineating the emergence of functional diversity in enzyme superfamilies., Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold.
Published: 2017

18. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences

Author: Patricia C. Babbitt, Brian M. Westwood, Jacquelyn S. Fetrow, Julia D. Hayden, Thomas E. Ferrin, Kiran Kumar, Don Nguyendac, Shoshana D. Brown, Janelle B. Leuthaeuser, Brandon E. Turner, Stacy T. Knutson, John H. Morris, Gabrielle Shea, and Angela F. Harper
Subjects: 0301 basic medicine, Enolase, Active site, SUPERFAMILY, Computational biology, Biology, Bioinformatics, Biochemistry, Glutathione transferase, 03 medical and health sciences, 030104 developmental biology, Protein structure, GenBank, biology.protein, False positive rate, Cluster analysis, Molecular Biology
Abstract: Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
Published: 2017

19. A strategy for large-scale comparison of evolutionary- and reaction-based classifications of enzyme function

Author: Patricia C. Babbitt, Gemma L. Holliday, Shoshana D. Brown, David Mischel, and Benjamin J. Polacco
Subjects: Structure–reaction relationships, Computational biology, General Biochemistry, Genetics and Molecular Biology, DNA sequencing, Evolution- versus reaction-based enzyme classification, Enzyme function, 03 medical and health sciences, Structure-Activity Relationship, Protein structure, Comparison of enzyme classification systems, Similarity (network science), Protein function prediction, Databases, Protein, 030304 developmental biology, 0303 health sciences, Sequence, biology, Enolase superfamily, 030302 biochemistry & molecular biology, Computational Biology, Enzyme Commission number, Molecular Sequence Annotation, Functional annotation, Enzymes, Structure–function relationships, Enzyme classification systems, biology.protein, Original Article, General Agricultural and Biological Sciences, Function (biology), Information Systems
Abstract: Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how’ these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Published: 2019

20. Parallel molecular mechanisms for enzyme temperature adaptation

Author: Tzanko Doukov, Margaux M. Pinney, D. M. Sanchez, Ruibin Liang, Daniel Herschlag, Patricia C. Babbitt, Daniel A. Mokhtari, Todd J. Martínez, Filip Yabukarski, and Eyal Akiva
Subjects: chemistry.chemical_classification, Molecular interactions, Multidisciplinary, Extramural, Chemistry, Temperature, Amino acid substitution, Steroid Isomerases, Isomerase, Adaptation, Physiological, Evolution, Molecular, Enzyme, Amino Acid Substitution, Bacterial Proteins, Evolutionary biology, Molecular evolution, Enzyme Stability, Mutation, Epistasis, Adaptation
Abstract: Some like it hot, others not Enzymes strike a delicate balance between features that enhance chemical reactivity and those that contribute to stable structure. Both features are important and can be unrelated or antagonistic. Pinney et al. combined rich experimental work on thermophilic and mesophilic variants of the enzyme ketosteroid isomerase (KSI) with bioinformatic data from a diverse set of bacterial enzymes to reveal the molecular determinants of thermal adaptation in enzymes. For KSI, they observed a trade-off between activity and thermal stability that comes down to a single active-site residue. With their larger dataset, they identified patterns of individual amino acid substitutions that are favored at higher temperatures, and also consider how networks of stabilizing interactions develop. Science , this issue p. eaay2784
Published: 2019

21. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Author: Renzhi Cao, Alice C. McHardy, Cen Wan, Jonathan G. Lees, Vedrana Vidulin, Alex Warwick Vesztrocy, Huy N Nguyen, Devon Johnson, Ian Sillitoe, Alessandro Petrini, Richard Bonneau, Hans Moen, Peter L. Freddolino, Rui Fa, Alfredo Benso, Jianlin Cheng, Indika Kahanda, Qizhong Mao, Zihan Zhang, Chenguang Zhao, Rebecca L. Hurto, Predrag Radivojac, Stefano Di Carlo, Sayoni Das, Suwisa Kaewphan, Sabeur Aridhi, Alan Medlar, Casey S. Greene, Constance J. Jeffery, Christophe Dessimoz, Jose Manuel Rodriguez, Gianfranco Politano, Michele Berselli, Jia-Ming Chang, Deborah A. Hogan, Julian Gough, Tunca Doğan, David T. Jones, Claire O'Donovan, Volkan Atalay, Paolo Fontana, Feng Zhang, Shuwei Yao, Robert Hoehndorf, Olivier Lichtarge, Alex W. Crocker, Ahmet Sureyya Rifaioglu, Rabie Saidi, Farrokh Mehryary, Neven Sumonja, Yang Zhang, Florian Boecker, Jie Hou, Christine A. Orengo, Matteo Re, Natalie Thurlby, Chengxin Zhang, Stefano Pascarelli, Alberto Paccanaro, Hafeez Ur Rehman, Yuxiang Jiang, Mohammad R. K. Mofrad, Naihui Zhou, Asa Ben-Hur, Steven E. Brenner, Martti Tolvanen, Filip Ginter, Mark N. Wass, Patricia C. Babbitt, David W. Ritchie, George Georghiou, Stefano Toppo, Caleb Chandler, Larry Davis, Da Chen Emily Koo, Itamar Borukhov, Petri Törönen, Rengul Cetin-Atalay, Fabio Fabris, Haixuan Yang, Kai Hakala, Silvio C. E. Tosatto, Domenico Cozzetto, Slobodan Vucetic, Balint Z. Kacsoh, Luke W Sagers, Alex A. Freitas, Tapio Salakoski, Fran Supek, Alfonso E. Romero, Angela D. Wilkins, Elaine Zosa, Shanshan Zhang, Yotam Frank, Jonathan B. Dayton, Jeffrey M. Yunes, Pier Luigi Martelli, Dallas J. Larsen, Giuliano Grossi, Alexandra J. Lee, Marco Mesiti, Yi-Wei Liu, Jonas Reeb, Damiano Piovesan, Sean D. Mooney, Magdalena Antczak, Erica Suh, Marco Falda, Marie-Dominique Devignes, Castrense Savojardo, Zheng Wang, Danielle A Brackenridge, Peter W. Rose, Enrico Lavezzo, Dane Jo, Ronghui You, Tomislav Šmuc, Liam J. McGuffin, Michael L. Tress, Ilya Novikov, Adrian M. Altenhoff, Burkhard Rost, Miguel Amezola, Mateo Torres, Prajwal Bhat, Wen-Hung Liao, Meet Barot, Marco Notaro, Suyang Dai, Giorgio Valentini, Jari Björne, Nevena Veljkovic, Wei-Cheng Tseng, Po-Han Chi, Alperen Dalkiran, Maxat Kulmanov, Nafiz Hamid, Aashish Jain, Branislava Gemovic, Alexandre Renaux, Ashton Omdahl, Daniel B. Roche, Vladimir Perovic, Iddo Friedberg, Daisuke Kihara, Giovanni Bosco, Gage S. Black, Saso Dzeroski, Liisa Holm, Marco Frasca, Michal Linial, Ehsaneddin Asgari, Tatyana Goldberg, Maria Jesus Martin, Vladimir Gligorijević, Marco Carraro, Shanfeng Zhu, Radoslav Davidovic, Timothy Bergquist, Hai Fang, José M. Fernández, Giuseppe Profiti, Weidong Tian, Imane Boudellioua, Kimberley A. Lewis, Seyed Ziaeddin Alborzi, and Rita Casadio
Subjects: 0303 health sciences, Protein function, biology, Computer science, 030302 biochemistry & molecular biology, Pseudomonas, Computational biology, Biological process, biology.organism_classification, Genome, 3. Good health, 03 medical and health sciences, Molecular function, Cellular component, Mutation screening, Critical assessment, Protein function prediction, Gene, Function (biology), 030304 developmental biology
Abstract: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility (P. aureginosa only). We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that, while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. We finally report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bioontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Published: 2019
Full Text: View/download PDF

22. Structural, Kinetic, and Mechanistic Analysis of an Asymmetric 4-Oxalocrotonate Tautomerase Trimer

Author: Yan Jessie Zhang, Brenda P. Medellin, Patricia C. Babbitt, Marieke de Ruijter, Christian P. Whitman, Bert-Jan Baas, Jake LeVieux, Eyal Akiva, and Shoshana D. Brown
Subjects: Stereochemistry, Burkholderia, Trimer, Sequence alignment, Biochemistry, Oligomer, Article, 03 medical and health sciences, chemistry.chemical_compound, Protein structure, Bacterial Proteins, Catalytic Domain, Hydrolase, Amino Acid Sequence, Isomerases, Protein Structure, Quaternary, 0303 health sciences, biology, Chemistry, Pseudomonas putida, 030302 biochemistry & molecular biology, Active site, Kinetics, Models, Chemical, Covalent bond, Mutation, 4-Oxalocrotonate tautomerase, biology.protein, Fatty Acids, Unsaturated, Sequence Alignment
Abstract: A 4-oxalocrotonate tautomerase (4-OT) trimer has been isolated from Burkholderia lata, and a kinetic, mechanistic, and structural analysis has been performed. The enzyme is the third described oligomer state for 4-OT along with a homo- and heterohexamer. The 4-OT trimer is part of a small subset of sequences (133 sequences) within the 4-OT subgroup of the tautomerase superfamily (TSF). The TSF has two distinct features: members are composed of a single β-α-β unit (homo- and heterohexamer) or two consecutively joined β-α-β units (trimer) and generally have a catalytic amino-terminal proline. The enzyme, designated as fused 4-OT, functions as a 4-OT where the active site groups (Pro-1, Arg-39, Arg-76, Phe-115, Arg-127) mirror those in the canonical 4-OT from Pseudomonas putida mt-2. Inactivation by 2-oxo-3-pentynoate suggests that Pro-1 of fused 4-OT has a low p Ka enabling the prolyl nitrogen to function as a general base. A remarkable feature of the fused 4-OT is the absence of P3 rotational symmetry in the structure (1.5 A resolution). The asymmetric arrangement of the trimer is not due to the fusion of the two β-α-β building blocks because an engineered "unfused" variant that breaks the covalent bond between the two units (to generate a heterohexamer) assumes the same asymmetric oligomerization state. It remains unknown how the different active site configurations contribute to the observed overall activities and whether the asymmetry has a biological purpose or role in the evolution of TSF members.
Published: 2019

23. The Cafa Challenge Reports Improved Protein Function Prediction And New Functional Annotations For Hundreds Of Genes Through Experimental Screens

Author: Heiko Schoof, Ahmet Sureyya Rifaioglu, Ian Sillitoe, Shanfeng Zhu, Marco Carraro, Naihui Zhou, Asa Ben-Hur, Rui Fa, Alice C. McHardy, David W. Ritchie, George Georghiou, Filip Ginter, Haixuan Yang, Alex A. Freitas, Constance J. Jeffery, Tapio Salakoski, Radoslav Davidovic, Huy N Nguyen, Devon Johnson, Yotam Frank, Alexandra J. Lee, Sean D. Mooney, Marco Falda, Marie-Dominique Devignes, Gianfranco Politano, David T. Jones, Silvio C. E. Tosatto, Renzhi Cao, Zihan Zhang, Sabeur Aridhi, Stefano Pascarelli, Vedrana Vidulin, Qizhong Mao, Balint Z. Kacsoh, Patricia C. Babbitt, Giovanni Bosco, Farrokh Mehryary, Florian Boecker, Alfonso E. Romero, Angela D. Wilkins, Saso Dzeroski, Richard Bonneau, Hans Moen, Chengxin Zhang, Prajwal Bhat, Giuliano Grossi, Martti Tolvanen, Matteo Re, Meet Barot, Mohammad R. K. Mofrad, Predrag Radivojac, Stefano Di Carlo, Tatyana Goldberg, Branislava Gemovic, Suyang Dai, Pier Luigi Martelli, Giorgio Valentini, Maxat Kulmanov, Maria Jesus Martin, Claire O'Donovan, Dallas J. Larsen, Alexandre Renaux, Alan Medlar, Jeffrey M. Yunes, Erica Suh, Volkan Atalay, Vladimir Gligorijević, Fran Supek, Elaine Zosa, Wei-Cheng Tseng, Nafiz Hamid, Marco Mesiti, Tunca Doğan, Petri Törönen, Hafeez Ur Rehman, Jose Manuel Rodriguez, Alessandro Petrini, Sayoni Das, Burkhard Rost, Miguel Amezola, Mateo Torres, Jianlin Cheng, Daisuke Kihara, Liisa Holm, Marco Frasca, Steven E. Brenner, Stefano Toppo, Adrian M. Altenhoff, Chenguang Zhao, Daniel B. Roche, Alperen Dalkiran, Alex W. Crocker, Marco Notaro, Iddo Friedberg, Michal Linial, Julian Gough, Damiano Piovesan, Slobodan Vucetic, Natalie Thurlby, Olivier Lichtarge, Jari Björne, Jonas Reeb, Rabie Saidi, Yuxiang Jiang, Christophe Dessimoz, Jie Hou, Ronghui You, Tomislav Šmuc, Paolo Fontana, Michele Berselli, Jia-Ming Chang, Deborah A. Hogan, Larry Davis, Ehsaneddin Asgari, Shuwei Yao, Zheng Wang, Fabio Fabris, Michael L. Tress, Caleb Chandler, Christine A. Orengo, Rengul Cetin Atalay, Castrense Savojardo, Danielle A Brackenridge, Peter W. Rose, Yang Zhang, Dane Jo, Gage S. Black, Shanshan Zhang, Aashish Jain, Liam J. McGuffin, Timothy Bergquist, Peter L. Freddolino, Robert Hoehndorf, Rita Casadio, Da Chen Emily Koo, Mark N. Wass, Hai Fang, Casey S. Greene, Suwisa Kaewphan, Magdalena Antczak, Wen-Hung Liao, Enrico Lavezzo, Neven Sumonja, Ashton Omdahl, José M. Fernández, Ilya Novikov, Jonathan B. Dayton, Feng Zhang, Vladimir Perovic, Cen Wan, Jonathan G. Lees, Kai Hakala, Weidong Tian, Alex Warwick Vesztrocy, Domenico Cozzetto, Nevena Veljkovic, Yi-Wei Liu, Imane Boudellioua, Po-Han Chi, Kimberley A. Lewis, Seyed Ziaeddin Alborzi, Giuseppe Profiti, Alberto Paccanaro, Itamar Borukhov, Alfredo Benso, Indika Kahanda, Rebecca L. Hurto, Bilgisayar Mühendisliği, National Science Foundation (United States), Gordon and Betty Moore Foundation, United States of Department of Health & Human Services, Cystic Fibrosis Foundation, Consejo Nacional de Ciencia y Tecnología (México), Deutsche Forschungsgemeinschaft (Alemania), European Research Council, Ministerio de Ciencia e Innovación (España), Unión Europea, University of Turku (Finlandia), Finlands Akademi (Finlandia), National Natural Science Foundation of China, Nanjing Agricultural University. The Academy of Science. National Key Research & Development Program of China, Ministero dell Istruzione, dell Universita e della Ricerca (Italia), Shanghai Municipal Science and Technology Major Project, Biotechnology and Biological Sciences Research Council (Reino Unido), Extreme Science and Engineering Discovery Environment, Ministry of Education, Science and Technological Development (Serbia), Ministry of Science and Technology, Ministry for Education (Baviera) (Alemania), Yad Hanadiv, University of Milan (Italia), Swiss National Science Foundation, Unión Europea. European Cooperation in Science and Technology (COST), Plataforma ISCIII de Bioinformática (España), Scientific and Technological Research Council of Turkey, Ministry of Education (China), University of Padua (Italia), Mühendislik ve Doğa Bilimleri Fakültesi -- Bilgisayar Mühendisliği Bölümü, Rifaioğlu, Ahmet Süreyya, Zhou N., Jiang Y., Bergquist T.R., Lee A.J., Kacsoh B.Z., Crocker A.W., Lewis K.A., Georghiou G., Nguyen H.N., Hamid M.N., Davis L., Dogan T., Atalay V., Rifaioglu A.S., Dalklran A., Cetin Atalay R., Zhang C., Hurto R.L., Freddolino P.L., Zhang Y., Bhat P., Supek F., Fernandez J.M., Gemovic B., Perovic V.R., Davidovic R.S., Sumonja N., Veljkovic N., Asgari E., Mofrad M.R.K., Profiti G., Savojardo C., Martelli P.L., Casadio R., Boecker F., Schoof H., Kahanda I., Thurlby N., McHardy A.C., Renaux A., Saidi R., Gough J., Freitas A.A., Antczak M., Fabris F., Wass M.N., Hou J., Cheng J., Wang Z., Romero A.E., Paccanaro A., Yang H., Goldberg T., Zhao C., Holm L., Toronen P., Medlar A.J., Zosa E., Borukhov I., Novikov I., Wilkins A., Lichtarge O., Chi P.-H., Tseng W.-C., Linial M., Rose P.W., Dessimoz C., Vidulin V., Dzeroski S., Sillitoe I., Das S., Lees J.G., Jones D.T., Wan C., Cozzetto D., Fa R., Torres M., Warwick Vesztrocy A., Rodriguez J.M., Tress M.L., Frasca M., Notaro M., Grossi G., Petrini A., Re M., Valentini G., Mesiti M., Roche D.B., Reeb J., Ritchie D.W., Aridhi S., Alborzi S.Z., Devignes M.-D., Koo D.C.E., Bonneau R., Gligorijevic V., Barot M., Fang H., Toppo S., Lavezzo E., Falda M., Berselli M., Tosatto S.C.E., Carraro M., Piovesan D., Ur Rehman H., Mao Q., Zhang S., Vucetic S., Black G.S., Jo D., Suh E., Dayton J.B., Larsen D.J., Omdahl A.R., McGuffin L.J., Brackenridge D.A., Babbitt P.C., Yunes J.M., Fontana P., Zhang F., Zhu S., You R., Zhang Z., Dai S., Yao S., Tian W., Cao R., Chandler C., Amezola M., Johnson D., Chang J.-M., Liao W.-H., Liu Y.-W., Pascarelli S., Frank Y., Hoehndorf R., Kulmanov M., Boudellioua I., Politano G., Di Carlo S., Benso A., Hakala K., Ginter F., Mehryary F., Kaewphan S., Bjorne J., Moen H., Tolvanen M.E.E., Salakoski T., Kihara D., Jain A., Smuc T., Altenhoff A., Ben-Hur A., Rost B., Brenner S.E., Orengo C.A., Jeffery C.J., Bosco G., Hogan D.A., Martin M.J., O'Donovan C., Mooney S.D., Greene C.S., Radivojac P., Friedberg I., Faculty of Economic and Social Sciences and Solvay Business School, Faculty of Sciences and Bioengineering Sciences, Faculty of Engineering, Computational genomics, Institute of Biotechnology, Bioinformatics, Genetics, Helsinki Institute of Life Science HiLIFE, Discovery Research Group/Prof. Hannu Toivonen, Iowa State University (ISU), European Bioinformatics Institute, École Polytechnique de Montréal (EPM), Vinča Institute of Nuclear Sciences, University of Belgrade [Belgrade], University of Bologna, Max Planck Institute for Plant Breeding Research (MPIPZ), European Virus Bioinformatics Center [Jena], Université libre de Bruxelles (ULB), Laboratoire d'Informatique, de Modélisation et d'optimisation des Systèmes (LIMOS), SIGMA Clermont (SIGMA Clermont)-Université d'Auvergne - Clermont-Ferrand I (UdA)-Ecole Nationale Supérieure des Mines de St Etienne-Centre National de la Recherche Scientifique (CNRS)-Université Blaise Pascal - Clermont-Ferrand 2 (UBP), Department of Computer Science, University of Bristol [Bristol], Department of Computer Science [Columbia], University of Missouri [Columbia] (Mizzou), University of Missouri System-University of Missouri System, Yale School of Public Health (YSPH), Departamento de Geometría y Topología, Universidad de Granada (UGR), Tumor Biology Center, Centre for Nephrology [London, UK], University College of London [London] (UCL), Baylor College of Medicine (BCM), Baylor University, Department of Knowledge Technologies, Structural and Molecular Biology Department, University College London, Queen Mary University of London (QMUL), Spanish National Cancer Research Center (CNIO), Dipartimento di Informatica, Università degli Studi di Milano [Milano] (UNIMI), Dipartimento di Scienze dell'Informazione [Milano], United States Naval Academy, Computational Algorithms for Protein Structures and Interactions (CAPSID), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Department of Molecular Medicine, Universita degli Studi di Padova, Centro de Regulación Genómica (CRG), Universitat Pompeu Fabra [Barcelona] (UPF), Physics Department, National Tsing Hua University [Hsinchu] (NTHU), Dipartimento di Automatica e Informatica [Torino] (DAUIN), Politecnico di Torino = Polytechnic of Turin (Polito), University of Turku, Bioinformatics Laboratory, University of Turku-Turku Center for Computer Science, Toyota Technological Institute at Chicago [Chicago] (TTIC), Swiss Institute of Bioinformatics [Lausanne] (SIB), Université de Lausanne (UNIL), Department of Computer Science [Colorado State University], Colorado State University [Fort Collins] (CSU), Centre for Plant Integrative Biology [Nothingham] (CPIB), University of Nottingham, UK (UON), BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany., University of Bologna/Università di Bologna, Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université d'Auvergne - Clermont-Ferrand I (UdA)-SIGMA Clermont (SIGMA Clermont)-Ecole Nationale Supérieure des Mines de St Etienne (ENSM ST-ETIENNE)-Centre National de la Recherche Scientifique (CNRS), Universidad de Granada = University of Granada (UGR), Università degli Studi di Milano = University of Milan (UNIMI), Università degli Studi di Padova = University of Padua (Unipd), and Université de Lausanne = University of Lausanne (UNIL)
Subjects: Library, Male, Identification, Candida-albicans, Protein function prediction, Long-term memory, Biofilm, Critical assessment, Community challenge, Procedures, Genome, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 0302 clinical medicine, Candida albicans, Molecular genetics, lcsh:QH301-705.5, ComputingMilieux_MISCELLANEOUS, Biological ontology, Settore BIO/11 - BIOLOGIA MOLECOLARE, 0303 health sciences, 318 Medical biotechnology, Biotechnology & applied microbiology, Ontology, Expectation, Genetics & heredity, Plant leaf, ddc, 3. Good health, Drosophila melanogaster, Human experiment, Fungal genome, Pseudomonas aeruginosa, Female, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Genome, Fungal, BIOINFORMATICS, Long-Term memory, Locomotion, Human, Adult, Memory, Long-Term, lcsh:QH426-470, Bioinformatics, Long term memory, Generation, Bacterial genome, Computational biology, Biology, Article, 03 medical and health sciences, Annotation, Big data, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Pseudomonas, Genetics, Animals, Humans, Gene, Ecology, Evolution, Behavior and Systematics, 030304 developmental biology, [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB], Animal, Research, Experimental data, Molecular Sequence Annotation, Cell Biology, Nonhuman, Human genetics, lcsh:Genetics, lcsh:Biology (General), Biofilms, Proteins | Genes | Protein functions, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], 030217 neurology & neurosurgery, Function (biology), Genome, Bacterial
Abstract: Tosatto, Silvio/0000-0003-4525-7793; Zhang, Feng/0000-0003-3447-897X; Gonzalez, Jose Maria Fernandez/0000-0002-4806-5140; Devignes, Marie-Dominique/0000-0002-0399-8713; Wass, Mark/0000-0001-5428-6479; Falda, Marco/0000-0003-2642-519X; Thurlby, Natalie/0000-0002-1007-0286; Zosa, Elaine/0000-0003-2482-0663; Dessimoz, Christophe/0000-0002-2170-853X; Yunes, Jeffrey/0000-0003-1869-3231; Hamid, Md Nafiz/0000-0001-8681-6526; Hoehndorf, Robert/0000-0001-8149-5890; Dogan, Tunca/0000-0002-1298-9763; NOTARO, MARCO/0000-0003-4309-2200; Cozzetto, Domenico/0000-0001-6752-5432; Lewis, Kimberley/0000-0003-3010-8453; Roche, Daniel/0000-0002-9204-1840; Martin, Maria-Jesus/0000-0001-5454-2815; Tress, Michael/0000-0001-9046-6370; Tolvanen, Martti/0000-0003-3434-7646; Cheng, Jianlin/0000-0003-0305-2853; Rose, Peter/0000-0001-9981-9750; Renaux, Alexandre/0000-0002-4339-2791; Kacsoh, Balint/0000-0001-9171-0611; O'Donovan, Claire/0000-0001-8051-7429; Kulmanov, Maxat/0000-0003-1710-1820; Friedberg, Iddo/0000-0002-1789-8000; Zhou, Naihui/0000-0001-6268-6149, WOS: 000498615000001, PubMed ID: 31744546, Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens., National Science FoundationNational Science Foundation (NSF) [DBI1564756, DBI-1458359, DBI-1458390, DMS1614777, CMMI1825941, NSF 1458390]; Gordon and Betty Moore FoundationGordon and Betty Moore Foundation [GBMF 4552]; National Institutes of Health NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [P20 GM113132]; Cystic Fibrosis Foundation [CFRDP STANTO19R0]; BBSRCBiotechnology and Biological Sciences Research Council (BBSRC) [BB/K004131/1, BB/F00964X/1, BB/M025047/1, BB/M015009/1]; Consejo Nacional de Ciencia y Tecnologia Paraguay (CONACyT)Consejo Nacional de Ciencia y Tecnologia (CONACyT) [14-INV-088, PINV15-315]; NSFNational Science Foundation (NSF) [1660648, DBI 1759934, IIS1763246, DBI-1458477, 0965768, DMR-1420073, DBI-1458443]; NIHUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [R01GM093123, DP1MH110234, UL1 TR002319, U24 TR002306]; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy-EXC 2155 "RESIST"German Research Foundation (DFG) [39087428]; National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [R01GM123055, R01GM60595, R15GM120650, GM083107, GM116960, AI134678, NIH R35-GM128637, R00-GM097033]; ERCEuropean Research Council (ERC) [StG 757700]; Spanish Ministry of Science, Innovation and Universities [BFU2017-89833-P]; Severo Ochoa award; Centre of Excellence project "BioProspecting of Adriatic Sea"; Croatian Government; European Regional Development FundEuropean Union (EU) [KK.01.1.1.01.0002]; ATT Tieto kayttoon grant; Academy of FinlandAcademy of Finland; University of Turku; CSC-IT Center for Science Ltd.; University of Miami; National Cancer Institute of the National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Cancer Institute (NCI) [U01CA198942]; Helsinki Institute for Life Sciences; Academy of FinlandAcademy of Finland [292589]; National Natural Science Foundation of ChinaNational Natural Science Foundation of China [31671367, 31471245, 91631301, 61872094, 61572139]; National Key Research and Development Program of China [2016YFC1000505, 2017YFC0908402]; Italian Ministry of Education, University and Research (MIUR) PRIN 2017 projectMinistry of Education, Universities and Research (MIUR) [2017483NH8]; Shanghai Municipal Science and Technology Major Project [2017SHZDZX01, 2018SHZDZX01]; UK Biotechnology and Biological Sciences Research CouncilBiotechnology and Biological Sciences Research Council (BBSRC) [BB/N019431/1, BB/L020505/1, BB/L002817/1]; Elsevier; Extreme Science and Engineering Discovery Environment (XSEDE) award [MCB160101, MCB160124]; Ministry of Education, Science and Technological Development of the Republic of Serbia [173001]; Taiwan Ministry of Science and Technology [106-2221-E-004-011-MY2]; Montana State University; Bavarian Ministry for Education; Simons Foundation; NIH NINDSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of Neurological Disorders & Stroke (NINDS) [1R21NS103831-01]; University of Illinois at Chicago (UIC) Cancer Center award; UIC College of Liberal Arts and Sciences Faculty Award; UIC International Development Award; Yad Hanadiv [9660/2019]; National Institute of General Medical Science of the National Institute of Health [GM066099, GM079656]; Research Supporting Plan (PSR) of University of Milan [PSR2018-DIP-010-MFRAS]; Swiss National Science FoundationSwiss National Science Foundation (SNSF) [150654]; EMBL-European Bioinformatics Institute core funds; CAFA BBSRC [BB/N004876/1]; European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grantEuropean Union (EU) [778247]; COST ActionEuropean Cooperation in Science and Technology (COST) [BM1405]; NIH/NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [R01 GM071749]; National Human Genome Research Institute of the National of Health [U41 HG007234]; INB Grant (ISCIII-SGEFI/ERDF) [PT17/0009/0001]; TUBITAKTurkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) [EEEAG-116E930]; KanSil [2016K121540]; Universita degli Studi di Milano; 111 ProjectMinistry of Education, China - 111 Project [B18015]; key project of Shanghai Science Technology [16JC1420402]; ZJLab; project Ribes Network POR-FESR 3S4H [TOPP-ALFREVE18-01]; PRID/SID of University of Padova [TOPP-SID19-01]; NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [R15GM120650]; King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [URF/1/3454-01-01, URF/1/3790-01-01]; "the Human Project from Mind, Brain and Learning" of the NCCU Higher Education Sprout Project by the Taiwan Ministry of Education; National Center for High-performance ComputingIstanbul Technical University, The work of IF was funded, in part, by the National Science Foundation award DBI-1458359. The work of CSG and AJL was funded, in part, by the National Science Foundation award DBI-1458390 and GBMF 4552 from the Gordon and Betty Moore Foundation. The work of DAH and KAL was funded, in part, by the National Science Foundation award DBI-1458390, National Institutes of Health NIGMS P20 GM113132, and the Cystic Fibrosis Foundation CFRDP STANTO19R0. The work of AP, HY, AR, and MT was funded by BBSRC grants BB/K004131/1, BB/F00964X/1 and BB/M025047/1, Consejo Nacional de Ciencia y Tecnologia Paraguay (CONACyT) grants 14-INV-088 and PINV15-315, and NSF Advances in BioInformatics grant 1660648. The work of JC was partially supported by an NIH grant (R01GM093123) and two NSF grants (DBI 1759934 and IIS1763246). ACM acknowledges the support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -EXC 2155 "RESIST" - Project ID 39087428. DK acknowledges the support from the National Institutes of Health (R01GM123055) and the National Science Foundation (DMS1614777, CMMI1825941). PB acknowledges the support from the National Institutes of Health (R01GM60595). GB and BZK acknowledge the support from the National Science Foundation (NSF 1458390) and NIH DP1MH110234. FS was funded by the ERC StG 757700 "HYPER-INSIGHT" and by the Spanish Ministry of Science, Innovation and Universities grant BFU2017-89833-P. FS further acknowledges the funding from the Severo Ochoa award to the IRB Barcelona. TS was funded by the Centre of Excellence project "BioProspecting of Adriatic Sea", co-financed by the Croatian Government and the European Regional Development Fund (KK.01.1.1.01.0002). The work of SK was funded by ATT Tieto kayttoon grant and Academy of Finland. JB and HM acknowledge the support of the University of Turku, the Academy of Finland and CSC -IT Center for Science Ltd. TB and SM were funded by the NIH awards UL1 TR002319 and U24 TR002306. The work of CZ and ZW was funded by the National Institutes of Health R15GM120650 to ZW and start-up funding from the University of Miami to ZW. The work of PWR was supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA198942. PR acknowledges NSF grant DBI-1458477. PT acknowledges the support from Helsinki Institute for Life Sciences. The work of AJM was funded by the Academy of Finland (No. 292589). The work of FZ and WT was funded by the National Natural Science Foundation of China (31671367, 31471245, 91631301) and the National Key Research and Development Program of China (2016YFC1000505, 2017YFC0908402]. CS acknowledges the support by the Italian Ministry of Education, University and Research (MIUR) PRIN 2017 project 2017483NH8. SZ is supported by the National Natural Science Foundation of China (No. 61872094 and No. 61572139) and Shanghai Municipal Science and Technology Major Project (No. 2017SHZDZX01). PLF and RLH were supported by the National Institutes of Health NIH R35-GM128637 and R00-GM097033. JG, DTJ, CW, DC, and RF were supported by the UK Biotechnology and Biological Sciences Research Council (BB/N019431/1, BB/L020505/1, and BB/L002817/1) and Elsevier. The work of YZ and CZ was funded in part by the National Institutes of Health award GM083107, GM116960, and AI134678; the National Science Foundation award DBI1564756; and the Extreme Science and Engineering Discovery Environment (XSEDE) award MCB160101 and MCB160124.; The work of BG, VP, RD, NS, and NV was funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia, Project No. 173001. The work of YWL, WHL, and JMC was funded by the Taiwan Ministry of Science and Technology (106-2221-E-004-011-MY2). YWL, WHL, and JMC further acknowledge the support from "the Human Project from Mind, Brain and Learning" of the NCCU Higher Education Sprout Project by the Taiwan Ministry of Education and the National Center for High-performance Computing for computer time and facilities. The work of IK and AB was funded by Montana State University and NSF Advances in Biological Informatics program through grant number 0965768. BR, TG, and JR are supported by the Bavarian Ministry for Education through funding to the TUM. The work of RB, VG, MB, and DCEK was supported by the Simons Foundation, NIH NINDS grant number 1R21NS103831-01 and NSF award number DMR-1420073. CJJ acknowledges the funding from a University of Illinois at Chicago (UIC) Cancer Center award, a UIC College of Liberal Arts and Sciences Faculty Award, and a UIC International Development Award. The work of ML was funded by Yad Hanadiv (grant number 9660/2019). The work of OL and IN was funded by the National Institute of General Medical Science of the National Institute of Health through GM066099 and GM079656. Research Supporting Plan (PSR) of University of Milan number PSR2018-DIP-010-MFRAS. AWV acknowledges the funding from the BBSRC (CASE studentship BB/M015009/1). CD acknowledges the support from the Swiss National Science Foundation (150654). CO and MJM are supported by the EMBL-European Bioinformatics Institute core funds and the CAFA BBSRC BB/N004876/1. GG is supported by CAFA BBSRC BB/N004876/1. SCET acknowledges funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 778247 (IDPfun) and from COST Action BM1405 (NGP-net). SEB was supported by NIH/NIGMS grant R01 GM071749. The work of MLT, JMR, and JMF was supported by the National Human Genome Research Institute of the National of Health, grant numbers U41 HG007234. The work of JMF and JMR was also supported by INB Grant (PT17/0009/0001 - ISCIII-SGEFI/ERDF). VA acknowledges the funding from TUBITAK EEEAG-116E930. RCA acknowledges the funding from KanSil 2016K121540. GV acknowledges the funding from Universita degli Studi di Milano - Project "Discovering Patterns in Multi-Dimensional Data" and Project "Machine Learning and Big Data Analysis for Bioinformatics". SZ is supported by the National Natural Science Foundation of China (No. 61872094 and No. 61572139) and Shanghai Municipal Science and Technology Major Project (No. 2017SHZDZX01). RY and SY are supported by the 111 Project (NO. B18015), the key project of Shanghai Science & Technology (No. 16JC1420402), Shanghai Municipal Science and Technology Major Project (No. 2018SHZDZX01), and ZJLab. ST was supported by project Ribes Network POR-FESR 3S4H (No. TOPP-ALFREVE18-01) and PRID/SID of University of Padova (No. TOPP-SID19-01). CZ and ZW were supported by the NIGMS grant R15GM120650 to ZW and start-up funding from the University of Miami to ZW. The work of MK and RH was supported by the funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/3454-01-01 and URF/1/3790-01-01. The work of SDM is funded, in part, by NSF award DBI-1458443.
Published: 2019

24. Exploring the sequence, function, and evolutionary space of protein superfamilies using sequence similarity networks and phylogenetic reconstructions

Author: Dave W. Anderson, Patricia C. Babbitt, Eyal Akiva, Nobuhiko Tokuriki, and Janine N. Copp
Subjects: Functional diversity, Phylogenetic tree, Similarity (network science), A protein, SUPERFAMILY, Computational biology, Biology, Function (biology), Functional divergence, Sequence (medicine)
Abstract: Integrative computational methods can facilitate the discovery of new protein functions and enzymatic reactions by enabling the observation and investigation of complex sequence-structure-function and evolutionary relationships within protein superfamilies. Here, we highlight the use of sequence similarity networks (SSNs) and phylogenetic reconstructions to map the functional divergence and evolutionary history of protein superfamilies. We exemplify this approach using the nitroreductase (NTR) flavoenzyme superfamily, demonstrating that SSN investigations can provide a rapid and effective means to classify groups of proteins, expose sequence similarity relationships across the global scale of a protein superfamily, and efficiently support detailed phylogenetic analyses. Integration of such approaches with systematic experimental characterization will expand our understanding of the functional diversity of enzymes, their evolution, and their associated physiological roles.
Published: 2019

25. InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Author: Ian Sillitoe, Rodrigo Lopez, Arun Prasad Pandurangan, Christian J. A. Sigrist, Lorna Richardson, Sara El-Gebali, Matloob Qureshi, Hsin-Yu Chang, Nicole Redaschi, Narmada Thanki, Robert D. Finn, Typhaine Paysan-Lafosse, Darren A. Natale, Silvio C. E. Tosatto, Simon C. Potter, Gift Nuka, Huaiyu Mi, Gustavo A. Salazar, Julian Gough, Paul Thomas, David R. Haft, Matthew Fraser, Christine A. Orengo, Alex L. Mitchell, Fábio Madeira, Sebastien Pesseat, Amaia Sangrador-Vegas, Siew-Yit Yong, Catherine Rivoire, Shoshana D. Brown, Aurelien Luciani, Alan Bridge, Peer Bork, Aron Marchler-Bauer, Patricia C. Babbitt, Hongzhan Huang, Matthias Blum, Neil D. Rawlings, Teresa K. Attwood, Ivica Letunic, Granger G. Sutton, and Marco Necci
Subjects: InterPro, Biology, 03 medical and health sciences, User-Computer Interface, 0302 clinical medicine, Protein sequencing, Protein Domains, Databases, Genetic, Genetics, Animals, Humans, Database Issue, Entry type, Databases, Protein, 030304 developmental biology, 0303 health sciences, Internet, Information retrieval, Sequence Homology, Amino Acid, Molecular Sequence Annotation, Data access, Gene Ontology, UniProt Knowledgebase, Multigene Family, UniProt, 030217 neurology & neurosurgery, Software
Abstract: The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.
Published: 2018

26. Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a 'Plug and Play' Domain

Author: Gemma L. Holliday, Elaine C. Meng, Patricia C. Babbitt, Shoshana D. Brown, Ursula Pieper, Sara Calhoun, Eyal Akiva, Andrej Sali, and Squire J. Booker
Subjects: 0301 basic medicine, S-Adenosylmethionine, Free Radicals, Computational Biology, SUPERFAMILY, Computational biology, Phylogenetic distribution, Article, Enzymes, Divergent evolution, Evolution, Molecular, 03 medical and health sciences, Structure-Activity Relationship, 030104 developmental biology, Protein Domains, Motif (music), Amino Acid Sequence, Radical SAM, Sequence Alignment, Biogenesis, Phylogeny
Abstract: The radical SAM superfamily contains over 100,000 homologous enzymes that catalyze a remarkably broad range of reactions required for life, including metabolism, nucleic acid modification, and biogenesis of cofactors. While the highly conserved SAM-binding motif responsible for formation of the key 5′-deoxyadenosyl radical intermediate is a key structural feature that simplifies identification of superfamily members, our understanding of their structure–function relationships is complicated by the modular nature of their structures, which exhibit varied and complex domain architectures. To gain new insight about these relationships, we classified the entire set of sequences into similarity-based subgroups that could be visualized using sequence similarity networks. This superfamily-wide analysis reveals important features that had not previously been appreciated from studies focused on one or a few members. Functional information mapped to the networks indicates which members have been experimentally or structurally characterized, their known reaction types, and their phylogenetic distribution. Despite the biological importance of radical SAM chemistry, the vast majority of superfamily members have never been experimentally characterized in any way, suggesting that many new reactions remain to be discovered. In addition to 20 subgroups with at least one known function, we identified additional subgroups made up entirely of sequences of unknown function. Importantly, our results indicate that even general reaction types fail to track well with our sequence similarity-based subgroupings, raising major challenges for function prediction for currently identified and new members that continue to be discovered. Interactive similarity networks and other data from this analysis are available from the Structure-Function Linkage Database.
Published: 2018

27. Revealing Unexplored Sequence-Function Space Using Sequence Similarity Networks

Author: Patricia C. Babbitt, Eyal Akiva, Nobuhiko Tokuriki, and Janine N. Copp
Subjects: 0301 basic medicine, Protein function, Function space, Computer science, Repertoire, Computational Biology, Proteins, SUPERFAMILY, Computational biology, Nitroreductases, Biochemistry, Evolution, Molecular, 03 medical and health sciences, Functional diversity, Structure-Activity Relationship, 030104 developmental biology, Similarity (psychology), Amino Acid Sequence, Databases, Protein, Peptide sequence, Sequence (medicine)
Abstract: The rapidly expanding number of protein sequences found in public databases can improve our understanding of how protein functions evolve. However, our current knowledge of protein function likely represents a small fraction of the diverse repertoire that exists in nature. Integrative computational methods can facilitate the discovery of new protein functions and enzymatic reactions through the observation and investigation of the complex sequence-structure-function relationships within protein superfamilies. Here, we highlight the use of sequence similarity networks (SSNs) to identify previously unexplored sequence and function space. We exemplify this approach using the nitroreductase (NTR) superfamily. We demonstrate that SSN investigations can provide a rapid and effective means to classify groups of proteins, therefore exposing experimentally unexplored sequences that may exhibit novel functionality. Integration of such approaches with systematic experimental characterization will expand our understanding of the functional diversity of enzymes and their associated physiological roles.
Published: 2018

28. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

Author: Janelle B. Leuthaeuser, Stacy T. Knutson, Patricia C. Babbitt, Kiran Kumar, and Jacquelyn S. Fetrow
Subjects: Molecular Sequence Data, Computational biology, Bioinformatics, Network topology, Biochemistry, protein similarity network analysis, Comparison of topologies, Annotation, Structure-Activity Relationship, network-based clustering, Catalytic Domain, Cluster Analysis, Amino Acid Sequence, Protein Interaction Maps, Cluster analysis, Databases, Protein, Molecular Biology, Peptide sequence, biology, active site profiling, function annotation transfer, Active site, Computational Biology, Proteins, Molecular Sequence Annotation, Articles, similarity-based clustering, Structure-Function Linkage Database (SFLD), Cellular Microenvironment, biology.protein, protein function annotation, Protein network
Abstract: The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods.
Published: 2015

29. Determinants of the CmoB carboxymethyl transferase utilized for selective tRNA wobble modification

Author: Yikai Wang, Keisha Thomas, Hui Xiao, Steven C. Almo, Shoshana D. Brown, Patricia C. Babbitt, Jeffrey B. Bonanno, Jungwook Kim, Junseock Koh, and Young-Sam Lee
Subjects: S-Adenosylmethionine, Mutation, Binding Sites, Nucleic Acid Enzymes, Escherichia coli Proteins, Translation (biology), Methyltransferases, Wobble base pair, Biology, Ligands, medicine.disease_cause, Genetic code, RNA, Transfer, Biochemistry, Transfer RNA, Genetics, medicine, Thermodynamics, Transferase, Binding site, Function (biology)
Abstract: Enzyme-mediated modifications at the wobble position of tRNAs are essential for the translation of the genetic code. We report the genetic, biochemical and structural characterization of CmoB, the enzyme that recognizes the unique metabolite carboxy-S-adenosine-L-methionine (Cx-SAM) and catalyzes a carboxymethyl transfer reaction resulting in formation of 5-oxyacetyluridine at the wobble position of tRNAs. CmoB is distinctive in that it is the only known member of the SAM-dependent methyltransferase (SDMT) superfamily that utilizes a naturally occurring SAM analog as the alkyl donor to fulfill a biologically meaningful function. Biochemical and genetic studies define the in vitro and in vivo selectivity for Cx-SAM as alkyl donor over the vastly more abundant SAM. Complementary high-resolution structures of the apo- and Cx-SAM bound CmoB reveal the determinants responsible for this remarkable discrimination. Together, these studies provide mechanistic insight into the enzymatic and non-enzymatic feature of this alkyl transfer reaction which affords the broadened specificity required for tRNAs to recognize multiple synonymous codons.
Published: 2015

30. Biocuration in the structure–function linkage database: the anatomy of a superfamily

Author: Shoshana D. Brown, Scott C.-H. Pegg, Elaine C. Meng, Conrad C. Huang, Patricia C. Babbitt, Eyal Akiva, Michael A. Hicks, John H. Morris, David Mischel, Gemma L. Holliday, and Thomas E. Ferrin
Subjects: 0301 basic medicine, Biology, computer.software_genre, General Biochemistry, Genetics and Molecular Biology, 03 medical and health sciences, Structure-Activity Relationship, Data sequences, Databases, Protein, Sequence (medicine), Linkage (software), Database, Structure function, SUPERFAMILY, Molecular Sequence Annotation, Enzymes, 030104 developmental biology, Gene Ontology, Structural Homology, Protein, Ontology, Original Article, General Agricultural and Biological Sciences, Corrigendum, computer, Chemical function, Information Systems
Abstract: Author(s): Holliday, Gemma L; Brown, Shoshana D; Akiva, Eyal; Mischel, David; Hicks, Michael A; Morris, John H; Huang, Conrad C; Meng, Elaine C; Pegg, Scott C-H; Ferrin, Thomas E; Babbitt, Patricia C | Abstract: With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD.Database urlhttp://sfld.rbvi.ucsf.edu/.
Published: 2017

31. An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins

Author: Jacquelyn S. Fetrow, Patricia C. Babbitt, Angela F. Harper, Leslie B. Poole, Thomas E. Ferrin, John H. Morris, Janelle B. Leuthaeuser, and Orengo, Christine A
Subjects: 0301 basic medicine, Sequence Homology, Bioinformatics, Biochemistry, Mathematical Sciences, Database and Informatics Methods, Protein Structure Databases, Protein structure, Sequence Analysis, Protein, Protein Interaction Mapping, Macromolecular Structure Analysis, Database Searching, Databases, Protein, lcsh:QH301-705.5, Peptide sequence, Data Management, Ecology, Biological Sciences, Phylogenetics, Amino Acid, Computational Theory and Mathematics, Multigene Family, Modeling and Simulation, GenBank, Sequence Analysis, Research Article, Protein Binding, Protein Structure, Computer and Information Sciences, Sequence analysis, Molecular Sequence Data, Sequence Databases, Computational biology, Biology, Research and Analysis Methods, Databases, 03 medical and health sciences, Cellular and Molecular Neuroscience, Sequence Motif Analysis, Information and Computing Sciences, Genetics, Evolutionary Systematics, Amino Acid Sequence, Sequence Similarity Searching, Cluster analysis, Molecular Biology, Ecology, Evolution, Behavior and Systematics, Taxonomy, Evolutionary Biology, Binding Sites, Sequence Homology, Amino Acid, Protein, Biology and Life Sciences, Proteins, Peroxiredoxins, Protein superfamily, High-Throughput Screening Assays, Hierarchical clustering, Enzyme Activation, Biological Databases, 030104 developmental biology, lcsh:Biology (General), Database Management Systems
Abstract: Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially—MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method’s novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences., Author Summary Peroxiredoxins (Prxs) are a large, ubiquitous superfamily of proteins that are arguably the most important reductants of peroxide in biological systems. These proteins are involved in a diverse array of essential cellular functions, including peroxide reduction, signal transduction, circadian rhythms, chaperone function and apoptosis. Previously, Prxs have been classified multiple ways, based on biological role and evolutionary analysis. A more detailed expertly curated analysis identified six functionally relevant Prx classes and identified over 3500 proteins in these six classes; this set provides a validation for molecular function annotation methods. It is well-known that automated molecular functional annotation for individual protein sequences is difficult without detailed manual curation. In this work, we address this deficiency in available technologies by presenting a novel iterative method, MISST, for agglomeratively identifying superfamily members and clustering them into functionally relevant groups. Using this potentially automatable approach, 38,739 Prx sequences were identified from GenBank. MISST identified six functionally relevant clusters from these sequences, matching those previously identified by experts. Key mechanistic determinants and organismal distribution are explored. This analysis provides a significantly more complete understanding of this biologically important protein superfamily; the method lays a foundation for automated functionally relevant clustering of the protein universe.
Published: 2017

32. InterPro in 2017-beyond protein family and domain annotations

Author: Ian Sillitoe, Hsin-Yu Chang, Sara El-Gebali, Siew Yit Young, Youngmi Park, Jaina Mistry, Sebastien Pesseat, Narmada Thanki, Cathy H. Wu, Alan Bridge, Ioannis Xenarios, Ben Smithers, Simon C. Potter, Silvano Squizzato, Nicole Redaschi, Darren A. Natale, Paul Thomas, David R. Haft, Ivica Letunic, Granger G. Sutton, Silvio C. E. Tosatto, Gift Nuka, Gemma L. Holliday, Alex L. Mitchell, Teresa K. Attwood, Marco Necci, Neil D. Rawlings, Shennan Lu, Julian Gough, Alex Bateman, Patricia C. Babbitt, Lorna Richardson, Lai-Su L. Yeh, Amaia Sangrador-Vegas, Zsuzsanna Dosztányi, Rodrigo Lopez, Damiano Piovesan, Matthew Fraser, Aron Marchler-Bauer, Robert D. Finn, Peer Bork, Christine A. Orengo, Xiaosong Huang, Huaiyu Mi, Hongzhan Huang, Christian J. A. Sigrist, and Catherine Rivoire
Subjects: 0301 basic medicine, InterPro, Protein family, Simple Modular Architecture Research Tool, Computational biology, Biology, Bioinformatics, Domain (software engineering), Databases, 03 medical and health sciences, Annotation, Information and Computing Sciences, Genetics, Database Issue, Humans, Protein Interaction Domains and Motifs, Databases, Protein, Phylogeny, 030102 biochemistry & molecular biology, Protein, Computational Biology, Molecular Sequence Annotation, Biological Sciences, 030104 developmental biology, UniProt Knowledgebase, Generic health relevance, Software, Environmental Sciences, Developmental Biology, InterProScan
Abstract: InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.
Published: 2017

33. 3D Motifs

Author: Jerome P. Nilmeier, Elaine C. Meng, Benjamin J. Polacco, and Patricia C. Babbitt
Subjects: 0301 basic medicine, 03 medical and health sciences, 030104 developmental biology, 030102 biochemistry & molecular biology
Published: 2017

34. New Insights about Enzyme Evolution from Large Scale Studies of Sequence and Structure Relationships

Author: Patricia C. Babbitt and Shoshana D. Brown
Subjects: Substrate Specificities, Biology, Biochemistry, Molecular Evolution, Evolution, Molecular, Sequence Analysis, Protein, Molecular evolution, Phylogenetics, Catalytic Domain, Protein Evolution, Humans, Molecular Biology, Phylogeny, Sequence (medicine), chemistry.chemical_classification, Genetics, Structure (mathematical logic), Scale (chemistry), Computational Biology, Minireviews, SUPERFAMILY, Cell Biology, Enzymes, Enzyme Structure-Function Relationships, Enzyme, chemistry, Evolutionary biology, Multifunctional Enzyme, Biocatalysis, Enzyme Evolution, Networks, Enzyme Superfamily, human activities, Oxidation-Reduction
Abstract: Understanding how enzymes have evolved offers clues about their structure-function relationships and mechanisms. Here, we describe evolution of functionally diverse enzyme superfamilies, each representing a large set of sequences that evolved from a common ancestor and that retain conserved features of their structures and active sites. Using several examples, we describe the different structural strategies nature has used to evolve new reaction and substrate specificities in each unique superfamily. The results provide insight about enzyme evolution that is not easily obtained from studies of one or only a few enzymes.
Published: 2014

35. Prediction of Substrates for Glutathione Transferases by Covalent Docking

Author: Patricia C. Babbitt, Nir London, Guang Qiang Dong, Richard N. Armstrong, Chakrapani Kalyanaraman, Brian K. Shoichet, Hao Fan, Andrej Sali, Matthew P. Jacobson, Sara Calhoun, Megan C. Branch, and Susan T. Mashiyama
Subjects: Stereochemistry, General Chemical Engineering, Library and Information Sciences, Crystallography, X-Ray, Ligands, Article, Substrate Specificity, chemistry.chemical_compound, Catalytic Domain, Animals, Humans, KEGG, Databases, Protein, Glutathione Transferase, chemistry.chemical_classification, Virtual screening, Binding Sites, biology, Chemistry, Ligand, Active site, General Chemistry, Glutathione, Computer Science Applications, Molecular Docking Simulation, Enzyme, Biochemistry, Docking (molecular), Covalent bond, biology.protein
Abstract: Enzymes in the glutathione transferase (GST) superfamily catalyze the conjugation of glutathione (GSH) to electrophilic substrates. As a consequence they are involved in a number of key biological processes, including protection of cells against chemical damage, steroid and prostaglandin biosynthesis, tyrosine catabolism, and cell apoptosis. Although virtual screening has been used widely to discover substrates by docking potential noncovalent ligands into active site clefts of enzymes, docking has been rarely constrained by a covalent bond between the enzyme and ligand. In this study, we investigate the accuracy of docking poses and substrate discovery in the GST superfamily, by docking 6738 potential ligands from the KEGG and MetaCyc compound libraries into 14 representative GST enzymes with known structures and substrates using the PLOP program [JacobsonProteins2004, 55, 35115048827]. For X-ray structures as receptors, one of the top 3 ranked models is within 3 Å all-atom root mean square deviation (RMSD) of the native complex in 11 of the 14 cases; the enrichment LogAUC value is better than random in all cases, and better than 25 in 7 of 11 cases. For comparative models as receptors, near-native ligand–enzyme configurations are often sampled but difficult to rank highly. For models based on templates with the highest sequence identity, the enrichment LogAUC is better than 25 in 5 of 11 cases, not significantly different from the crystal structures. In conclusion, we show that covalent docking can be a useful tool for substrate discovery and point out specific challenges for future method improvement.
Published: 2014

36. Mechanistic and Bioinformatic Investigation of a Conserved Active Site Helix in α-Isopropylmalate Synthase from Mycobacterium tuberculosis, a Member of the DRE-TIM Metallolyase Superfamily

Author: Michael A. Hicks, Jordyn L. Johnson, Ashley K. Casey, Patricia C. Babbitt, and Patrick A. Frantom
Subjects: Models, Molecular, Allosteric regulation, Sequence alignment, Arginine, Biochemistry, Article, Catalysis, Protein Structure, Secondary, Protein structure, Allosteric Regulation, Leucine, Catalytic Domain, Amino Acid Sequence, Binding site, Peptide sequence, Binding Sites, biology, Aldolase A, Active site, Computational Biology, Mycobacterium tuberculosis, Lyase, Kinetics, Amino Acid Substitution, biology.protein, 2-Isopropylmalate Synthase, Sequence Alignment
Abstract: The characterization of functionally diverse enzyme superfamilies provides the opportunity to identify evolutionarily conserved catalytic strategies, as well as amino acid substitutions responsible for the evolution of new functions or specificities. Isopropylmalate synthase (IPMS) belongs to the DRE-TIM metallolyase superfamily. Members of this superfamily share common active site elements, including a conserved active site helix and an HXH divalent metal binding motif, associated with stabilization of a common enolate anion intermediate. These common elements are overlaid by variations in active site architecture resulting in the evolution of a diverse set of reactions that include condensation, lyase/aldolase, and carboxyl transfer activities. Here, using IPMS, an integrated biochemical and bioinformatics approach has been utilized to investigate the catalytic role of residues on an active site helix that is conserved across the superfamily. The construction of a sequence similarity network for the DRE-TIM metallolyase superfamily allows for the biochemical results obtained with IPMS variants to be compared across superfamily members and within other condensation-catalyzing enzymes related to IPMS. A comparison of our results with previous biochemical data indicates an active site arginine residue (R80 in IPMS) is strictly required for activity across the superfamily, suggesting that it plays a key role in catalysis, most likely through enolate stabilization. In contrast, differential results obtained from substitution of the C-terminal residue of the helix (Q84 in IPMS) suggest that this residue plays a role in reaction specificity within the superfamily.
Published: 2014

37. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences

Author: Stacy T, Knutson, Brian M, Westwood, Janelle B, Leuthaeuser, Brandon E, Turner, Don, Nguyendac, Gabrielle, Shea, Kiran, Kumar, Julia D, Hayden, Angela F, Harper, Shoshana D, Brown, John H, Morris, Thomas E, Ferrin, Patricia C, Babbitt, and Jacquelyn S, Fetrow
Subjects: active site profile, isofunctional clusters, Sequence Analysis, Protein, Phosphopyruvate Hydratase, functionally relevant clustering, Articles, function annotation, functional site profile, Databases, Protein, misannotation, mechanistic determinants, Article, Glutathione Transferase
Abstract: Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
Published: 2016

38. Evolutionary Reprograming of Protein-Protein Interaction Specificity

Author: Eyal Akiva and Patricia C. Babbitt
Subjects: Genetics, chemistry.chemical_classification, biology, Biochemistry, Genetics and Molecular Biology(all), Mesorhizobium, biology.organism_classification, Article, General Biochemistry, Genetics and Molecular Biology, Deep sequencing, Protein–protein interaction, Evolution, Molecular, Enzyme, chemistry, Mutation (genetic algorithm), Protein Interaction Maps, Protein Interaction Map
Abstract: Interacting proteins typically coevolve, and the identification of coevolving amino acids can pinpoint residues required for interaction specificity. This approach often assumes that an interface-disrupting mutation in one protein drives selection of a compensatory mutation in its partner during evolution. However, this model requires a non-functional intermediate state prior to the compensatory change. Alternatively, a mutation in one protein could first broaden its specificity, allowing changes in its partner, followed by a specificity-restricting mutation. Using bacterial toxin-antitoxin systems, we demonstrate the plausibility of this second, promiscuity-based model. By screening large libraries of interface mutants, we show that toxins and antitoxins with high specificity are frequently connected in sequence space to more promiscuous variants that can serve as intermediates during a reprogramming of interaction specificity. We propose that the abundance of promiscuous variants promotes the expansion and diversification of toxin-antitoxin systems and other paralogous protein families during evolution.
Published: 2015
Full Text: View/download PDF

39. The Structure–Function Linkage Database

Author: Shoshana D. Brown, John H. Morris, Patricia C. Babbitt, Alexandra M. Schnoes, Ashley F. Custer, Gemma L. Holliday, Susan T. Mashiyama, Jeffrey M. Yunes, Alan E. Barber, Florian Lauck, Michael A. Hicks, Doug Stryke, Thomas E. Ferrin, Elaine C. Meng, Eyal Akiva, Sunil Ojha, Conrad C. Huang, David Mischel, Daniel Almonacid, Massachusetts Institute of Technology. Department of Chemical Engineering, and Hicks, Michael A.
Subjects: Sequence alignment, Context (language use), Linkage (mechanical), Biology, computer.software_genre, law.invention, Databases, Structure-Activity Relationship, Similarity (network science), law, Information and Computing Sciences, Genetics, Databases, Protein, III. Metabolic and signalling pathways, enzymes, Sequence (medicine), Internet, Database, Protein, Structure function, Molecular Sequence Annotation, SUPERFAMILY, Biological Sciences, Enzymes, Generic health relevance, Sequence Alignment, computer, Environmental Sciences, Developmental Biology
Abstract: The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity. © 2013 The Author(s). Published by Oxford University Press.
Published: 2013

40. Discovery of new enzymes and metabolic pathways by using structure and genome context

Author: John E. Cronan, Matthew P. Jacobson, Suwen Zhao, B. Hillerich, Jonathan V. Sweedler, Ayano Sakai, Patricia C. Babbitt, Steven C. Almo, Matthew W. Vetting, Shoshana D. Brown, R.D. Seidel, John A. Gerlt, Jeffery B. Bonanno, Ritesh Kumar, and B. Mc Kay Wood
Subjects: Genetics, 0303 health sciences, Multidisciplinary, In silico, 030302 biochemistry & molecular biology, Genome project, Biology, Genome, Article, Gene expression profiling, 03 medical and health sciences, Metabolic pathway, Docking (molecular), Gene cluster, Gene, 030304 developmental biology
Abstract: Assigning valid functions to proteins identified in genome projects is challenging: overprediction and database annotation errors are the principal concerns. We and others are developing computation-guided strategies for functional discovery with 'metabolite docking' to experimentally derived or homology-based three-dimensional structures. Bacterial metabolic pathways often are encoded by 'genome neighbourhoods' (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by 'predicting' the intermediates in the glycolytic pathway in Escherichia coli. Metabolite docking to multiple binding proteins and enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. Here we report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and also the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt concentrations was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guided functional predictions to enable the discovery of new metabolic pathways.
Published: 2013

41. Structure-guided discovery of the metabolite carboxy-SAM that modulates tRNA function

Author: Jeffrey B. Bonanno, Young-Sam Lee, Yury Patskovsky, Steven C. Almo, Shoshana D. Brown, Xiangying Tang, Patricia C. Babbitt, Chakrapani Kalyanaraman, Matthew P. Jacobson, Jungwook Kim, Hui Xiao, and Nawar Al-Obaidi
Subjects: Models, Molecular, S-Adenosylmethionine, Cyclohexanecarboxylic Acids, Context (language use), Wobble base pair, Biology, Crystallography, X-Ray, Ligands, 01 natural sciences, Protein Structure, Secondary, Article, Structural genomics, 03 medical and health sciences, Protein structure, RNA, Transfer, Catalytic Domain, Cyclohexenes, Escherichia coli, Transferase, Uridine, One-Carbon Group Transferases, 030304 developmental biology, 0303 health sciences, Multidisciplinary, 010405 organic chemistry, Escherichia coli Proteins, food and beverages, RNA, Methyltransferases, Biosynthetic Pathways, 0104 chemical sciences, Molecular Weight, RNA, Bacterial, Biochemistry, Transfer RNA, Biocatalysis, Protein Multimerization, Function (biology)
Abstract: Identifying novel metabolites and characterizing their biological functions are major challenges of the post-genomic era. X-ray crystallography can reveal unanticipated ligands which persist through purification and crystallization. These adventitious protein:ligand complexes provide insights into new activities, pathways and regulatory mechanisms. We describe a new metabolite, carboxy-S-adenosylmethionine (Cx-SAM), its biosynthetic pathway and its role in tRNA modification. The structure of CmoA, a member of the SAM-dependent methyltransferase superfamily, revealed a ligand in the catalytic site consistent with Cx-SAM. Mechanistic analyses demonstrated an unprecedented role for prephenate as the carboxyl donor and the involvement of a unique ylide intermediate as the carboxyl acceptor in the CmoA-mediated conversion of SAM to Cx-SAM. A second member of the SAM-dependent methyltransferase superfamily, CmoB, recognizes Cx-SAM and acts as a carboxymethyltransferase to convert 5-hydroxyuridine (ho5U) into 5-oxyacetyl uridine (cmo5U) at the wobble position of multiple tRNAs in Gram negative bacteria1, resulting in expanded codon-recognition properties2,3. CmoA and CmoB represent the first documented synthase and transferase for Cx-SAM. These findings reveal new functional diversity in the SAM-dependent methyltransferase superfamily and expand the metabolic and biological contributions of SAM-based biochemistry. These discoveries highlight the value of structural genomics approaches for identifying ligands in the context of their physiologically relevant macromolecular binding partners and for aiding in functional assignment.
Published: 2013

42. DASP3: identification of protein sequences belonging to functionally relevant groups

Author: Patricia C. Babbitt, Angela F. Harper, Thomas E. Ferrin, Janelle B. Leuthaeuser, Jacquelyn S. Fetrow, and John H. Morris
Subjects: 0301 basic medicine, Bioinformatics, Amino Acid Motifs, Computational biology, Biology, computer.software_genre, Biochemistry, Mathematical Sciences, Set (abstract data type), 03 medical and health sciences, Databases, Protein structure, Similarity (network science), Sequence Analysis, Protein, Structural Biology, Catalytic Domain, Information and Computing Sciences, False positive paradox, Cluster Analysis, Amino Acid Sequence, Databases, Protein, Cluster analysis, Molecular Biology, Sequence, Active site profiling, Applied Mathematics, Protein, Prevention, Proteins, Biological Sciences, Misannotation, Computer Science Applications, Functionally relevant clustering, Identification (information), 030104 developmental biology, Protein function annotation, Data mining, Generic health relevance, DNA microarray, computer, Sequence Analysis, Software, Algorithms
Abstract: Background Development of automatable processes for clustering proteins into functionally relevant groups is a critical hurdle as an increasing number of sequences are deposited into databases. Experimental function determination is exceptionally time-consuming and can’t keep pace with the identification of protein sequences. A tool, DASP (Deacon Active Site Profiler), was previously developed to identify protein sequences with active site similarity to a query set. Development of two iterative, automatable methods for clustering proteins into functionally relevant groups exposed algorithmic limitations to DASP. Results The accuracy and efficiency of DASP was significantly improved through six algorithmic enhancements implemented in two stages: DASP2 and DASP3. Validation demonstrated DASP3 provides greater score separation between true positives and false positives than earlier versions. In addition, DASP3 shows similar performance to previous versions in clustering protein structures into isofunctional groups (validated against manual curation), but DASP3 gathers and clusters protein sequences into isofunctional groups more efficiently than DASP and DASP2. Conclusions DASP algorithmic enhancements resulted in improved efficiency and accuracy of identifying proteins that contain active site features similar to those of the query set. These enhancements provide incremental improvement in structure database searches and initial sequence database searches; however, the enhancements show significant improvement in iterative sequence searches, suggesting DASP3 is an appropriate tool for the iterative processes required for clustering proteins into isofunctional groups. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1295-z) contains supplementary material, which is available to authorized users.
Published: 2016

43. Evaluating Functional Annotations of Enzymes Using the Gene Ontology

Author: Eyal Akiva, Patricia C. Babbitt, Rebecca Davidson, and Gemma L. Holliday
Subjects: 0301 basic medicine, Linkage (software), chemistry.chemical_classification, Computer science, Gene ontology, A protein, Computational biology, Bioinformatics, Open Biomedical Ontologies, 03 medical and health sciences, Identification (information), 030104 developmental biology, Enzyme, Molecular Sequence Annotation, chemistry, Similarity (psychology), Nucleic acid, Gene, Function (biology)
Abstract: The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.
Published: 2016

44. An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Author: Yuxiang Jiang, Tal Ronnen Oron, Wyatt T. Clark, Asma R. Bankapur, Daniel D’Andrea, Rosalba Lepore, Christopher S. Funk, Indika Kahanda, Karin M. Verspoor, Asa Ben-Hur, Da Chen Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed M. E. Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca, Christophe Dessimoz, Tunca Dogan, Kai Hakala, Suwisa Kaewphan, Farrokh Mehryary, Tapio Salakoski, Filip Ginter, Hai Fang, Ben Smithers, Matt Oates, Julian Gough, Petri Törönen, Patrik Koskinen, Liisa Holm, Ching-Tai Chen, Wen-Lian Hsu, Kevin Bryson, Domenico Cozzetto, Federico Minneci, David T. Jones, Samuel Chapman, Dukka BKC, Ishita K. Khan, Daisuke Kihara, Dan Ofer, Nadav Rappoport, Amos Stern, Elena Cibrian-Uhalte, Paul Denny, Rebecca E. Foulger, Reija Hieta, Duncan Legge, Ruth C. Lovering, Michele Magrane, Anna N. Melidoni, Prudence Mutowo-Meullenet, Klemens Pichler, Aleksandra Shypitsyna, Biao Li, Pooya Zakeri, Sarah ElShal, Léon-Charles Tranchevent, Sayoni Das, Natalie L. Dawson, David Lee, Jonathan G. Lees, Ian Sillitoe, Prajwal Bhat, Tamás Nepusz, Alfonso E. Romero, Rajkumar Sasidharan, Haixuan Yang, Alberto Paccanaro, Jesse Gillis, Adriana E. Sedeño-Cortés, Paul Pavlidis, Shou Feng, Juan M. Cejuela, Tatyana Goldberg, Tobias Hamp, Lothar Richter, Asaf Salamov, Toni Gabaldon, Marina Marcet-Houben, Fran Supek, Qingtian Gong, Wei Ning, Yuanpeng Zhou, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Stefano Toppo, Carlo Ferrari, Manuel Giollo, Damiano Piovesan, Silvio C.E. Tosatto, Angela del Pozo, José M. Fernández, Paolo Maietta, Alfonso Valencia, Michael L. Tress, Alfredo Benso, Stefano Di Carlo, Gianfranco Politano, Alessandro Savino, Hafeez Ur Rehman, Matteo Re, Marco Mesiti, Giorgio Valentini, Joachim W. Bargsten, Aalt D. J. van Dijk, Branislava Gemovic, Sanja Glisic, Vladmir Perovic, Veljko Veljkovic, Nevena Veljkovic, Danillo C. Almeida-e-Silva, Ricardo Z. N. Vencio, Malvika Sharan, Jörg Vogel, Lakesh Kansakar, Shanshan Zhang, Slobodan Vucetic, Zheng Wang, Michael J. E. Sternberg, Mark N. Wass, Rachael P. Huntley, Maria J. Martin, Claire O’Donovan, Peter N. Robinson, Yves Moreau, Anna Tramontano, Patricia C. Babbitt, Steven E. Brenner, Michal Linial, Christine A. Orengo, Burkhard Rost, Casey S. Greene, Sean D. Mooney, Iddo Friedberg, Predrag Radivojac, Jiang, Yuxiang, Oron, Tal Ronnen, Clark, Wyatt T., Bankapur, Asma R., D’Andrea, Daniel, Lepore, Rosalba, Funk, Christopher S., Kahanda, Indika, Verspoor, Karin M., Ben-Hur, Asa, Koo, Da Chen Emily, Penfold-Brown, Duncan, Shasha, Denni, Youngs, Noah, Bonneau, Richard, Lin, Alexandra, Sahraeian, Sayed M. E., Martelli, Pier Luigi, Profiti, Giuseppe, Casadio, Rita, Cao, Renzhi, Zhong, Zhaolong, Cheng, Jianlin, Altenhoff, Adrian, Skunca, Nive, Dessimoz, Christophe, Dogan, Tunca, Hakala, Kai, Kaewphan, Suwisa, Mehryary, Farrokh, Salakoski, Tapio, Ginter, Filip, Fang, Hai, Smithers, Ben, Oates, Matt, Gough, Julian, Törönen, Petri, Koskinen, Patrik, Holm, Liisa, Chen, Ching-Tai, Hsu, Wen-Lian, Bryson, Kevin, Cozzetto, Domenico, Minneci, Federico, Jones, David T., Chapman, Samuel, Bkc, Dukka, Khan, Ishita K., Kihara, Daisuke, Ofer, Dan, Rappoport, Nadav, Stern, Amo, Cibrian-Uhalte, Elena, Denny, Paul, Foulger, Rebecca E., Hieta, Reija, Legge, Duncan, Lovering, Ruth C., Magrane, Michele, Melidoni, Anna N., Mutowo-Meullenet, Prudence, Pichler, Klemen, Shypitsyna, Aleksandra, Li, Biao, Zakeri, Pooya, Elshal, Sarah, Tranchevent, Léon-Charle, Das, Sayoni, Dawson, Natalie L., Lee, David, Lees, Jonathan G., Sillitoe, Ian, Bhat, Prajwal, Nepusz, Tamá, Romero, Alfonso E., Sasidharan, Rajkumar, Yang, Haixuan, Paccanaro, Alberto, Gillis, Jesse, Sedeño-Cortés, Adriana E., Pavlidis, Paul, Feng, Shou, Cejuela, Juan M., Goldberg, Tatyana, Hamp, Tobia, Richter, Lothar, Salamov, Asaf, Gabaldon, Toni, Marcet-Houben, Marina, Supek, Fran, Gong, Qingtian, Ning, Wei, Zhou, Yuanpeng, Tian, Weidong, Falda, Marco, Fontana, Paolo, Lavezzo, Enrico, Toppo, Stefano, Ferrari, Carlo, Giollo, Manuel, Piovesan, Damiano, Tosatto, Silvio C.E., del Pozo, Angela, Fernández, José M., Maietta, Paolo, Valencia, Alfonso, Tress, Michael L., Benso, Alfredo, Di Carlo, Stefano, Politano, Gianfranco, Savino, Alessandro, Rehman, Hafeez Ur, Re, Matteo, Mesiti, Marco, Valentini, Giorgio, Bargsten, Joachim W., van Dijk, Aalt D. J., Gemovic, Branislava, Glisic, Sanja, Perovic, Vladmir, Veljkovic, Veljko, Veljkovic, Nevena, Almeida-e-Silva, Danillo C., Vencio, Ricardo Z. N., Sharan, Malvika, Vogel, Jörg, Kansakar, Lakesh, Zhang, Shanshan, Vucetic, Slobodan, Wang, Zheng, Sternberg, Michael J. E., Wass, Mark N., Huntley, Rachael P., Martin, Maria J., O’Donovan, Claire, Robinson, Peter N., Moreau, Yve, Tramontano, Anna, Babbitt, Patricia C., Brenner, Steven E., Linial, Michal, Orengo, Christine A., Rost, Burkhard, Greene, Casey S., Mooney, Sean D., Friedberg, Iddo, Radivojac, Predrag, Friedberg, Iddo [0000-0002-1789-8000], Apollo - University of Cambridge Repository, (ukupan broj autora: 147), Biotechnology and Biological Sciences Research Council (BBSRC), National Science Foundation (Estados Unidos), United States of Department of Health & Human Services, National Natural Science Foundation of China, Natural Sciences and Engineering Research Council (Canadá), São Paulo Research Foundation, Ministerio de Economía y Competitividad (España), Biotechnology and Biological Sciences Research Council (Reino Unido), Katholieke Universiteit Leuven (Bélgica), Newton International Fellowship Scheme of the Royal Society grant, British Heart Foundation, Ministry of Education, Science and Technological Development (Serbia), Office of Biological and Environmental Research (Estados Unidos), Australian Research Council, University of Padua (Italia), Swiss National Science Foundation, Institute of Biotechnology, Computational genomics, and Bioinformatics
Subjects: 0301 basic medicine, Computer science, Disease gene prioritization, Protein function prediction, Ecology, Evolution, Behavior and Systematics, Genetics, Cell Biology, 05 Environmental Sciences, 600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit, computer.software_genre, Quantitative Biology - Quantitative Methods, Wiskundige en Statistische Methoden - Biometris, Field (computer science), Laboratorium voor Plantenveredeling, Function (engineering), Databases, Protein, 1183 Plant biology, microbiology, virology, Quantitative Methods (q-bio.QM), media_common, Genetics & Heredity, Settore BIO/11 - BIOLOGIA MOLECOLARE, Ecology, SISTA, 1184 Genetics, developmental biology, physiology, Life Sciences & Biomedicine, Algorithms, Bioinformatics, Evolution, media_common.quotation_subject, BIOINFORMÁTICA, Machine learning, Bottleneck, Set (abstract data type), BIOS Applied Bioinformatics, 03 medical and health sciences, Annotation, Structure-Activity Relationship, Behavior and Systematics, Human Phenotype Ontology, Humans, ddc:610, DISINTEGRIN, Mathematical and Statistical Methods - Biometris, BIOINFORMATICS, 08 Information And Computing Sciences, Science & Technology, business.industry, Research, ADAM, Proteins, Computational Biology, Molecular Sequence Annotation, 06 Biological Sciences, Data set, ONTOLOGY, Plant Breeding, 030104 developmental biology, Gene Ontology, Biotechnology & Applied Microbiology, FOS: Biological sciences, Artificial intelligence, business, computer, Software
Abstract: BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent., We acknowledge the contributions of Maximilian Hecht, Alexander Grün, Julia Krumhoff, My Nguyen Ly, Jonathan Boidol, Rene Schoeffel, Yann Spöri, Jessika Binder, Christoph Hamm and Karolina Worf. This work was partially supported by the following grants: National Science Foundation grants DBI-1458477 (PR), DBI-1458443 (SDM), DBI-1458390 (CSG), DBI-1458359 (IF), IIS-1319551 (DK), DBI-1262189 (DK), and DBI-1149224 (JC); National Institutes of Health grants R01GM093123 (JC), R01GM097528 (DK), R01GM076990 (PP), R01GM071749 (SEB), R01LM009722 (SDM), and UL1TR000423 (SDM); the National Natural Science Foundation of China grants 3147124 (WT) and 91231116 (WT); the National Basic Research Program of China grant 2012CB316505 (WT); NSERC grant RGPIN 371348-11 (PP); FP7 infrastructure project TransPLANT Award 283496 (ADJvD); Microsoft Research/FAPESP grant 2009/53161-6 and FAPESP fellowship 2010/50491-1 (DCAeS); Biotechnology and Biological Sciences Research Council grants BB/L020505/1 (DTJ), BB/F020481/1 (MJES), BB/K004131/1 (AP), BB/F00964X/1 (AP), and BB/L018241/1 (CD); the Spanish Ministry of Economics and Competitiveness grant BIO2012-40205 (MT); KU Leuven CoE PFV/10/016 SymBioSys (YM); the Newton International Fellowship Scheme of the Royal Society grant NF080750 (TN). CSG was supported in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative grant GBMF4552. Computational resources were provided by CSC – IT Center for Science Ltd., Espoo, Finland (TS). This work was supported by the Academy of Finland (TS). RCL and ANM were supported by British Heart Foundation grant RG/13/5/30112. PD, RCL, and REF were supported by Parkinson’s UK grant G-1307, the Alexander von Humboldt Foundation through the German Federal Ministry for Education and Research, Ernst Ludwig Ehrlich Studienwerk, and the Ministry of Education, Science and Technological Development of the Republic of Serbia grant 173001. This work was a Technology Development effort for ENIGMA – Ecosystems and Networks Integrated with Genes and Molecular Assemblies (http://enigma.lbl.gov), a Scientific Focus Area Program at Lawrence Berkeley National Laboratory, which is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Biological & Environmental Research grant DE-AC02-05CH11231. ENIGMA only covers the application of this work to microbial proteins. NSF DBI-0965616 and Australian Research Council grant DP150101550 (KMV). NSF DBI-0965768 (ABH). NIH T15 LM00945102 (training grant for CSF). FP7 FET grant MAESTRA ICT-2013-612944 and FP7 REGPOT grant InnoMol (FS). NIH R01 GM60595 (PCB). University of Padova grants CPDA138081/13 (ST) and GRIC13AAI9 (EL). Swiss National Science Foundation grant 150654 and UK BBSRC grant BB/M015009/1 (COD). PRB2 IPT13/0001 - ISCIII-SGEFI / FEDER (JMF)., This is the final version of the article. It first appeared from BioMed Central at http://dx.doi.org/10.1186/s13059-016-1037-6.
Published: 2016

45. The evolution of function in strictosidine synthase-like proteins

Author: Lesley-Ann Giddings, Patricia C. Babbitt, Jenna Caldwell, Sarah E. O'Connor, Michael A. Hicks, and Alan E. Barber
Subjects: chemistry.chemical_classification, Strictosidine synthase, biology, Sequence alignment, Biochemistry, Enzyme, chemistry, Structural Biology, Phylogenetics, Strictosidine, Lactonase, biology.protein, Gluconolactonase, Molecular Biology, Function (biology)
Abstract: The exponential growth of sequence data provides abundant information for the discovery of new enzyme reactions. Correctly annotating the functions of highly diverse proteins can be difficult, however, hindering use of this information. Global analysis of large superfamilies of related proteins is a powerful strategy for understanding the evolution of reactions by identifying catalytic commonalities and differences in reaction and substrate specificity, even when only a few members have been biochemically or structurally characterized. A comparison of >2500 sequences sharing the six-bladed β-propeller fold establishes sequence, structural, and functional links among the three subgroups of the functionally diverse N6P superfamily: the arylesterase-like and senescence marker protein-30/gluconolactonase/luciferin-regenerating enzyme-like (SGL) subgroups, representing enzymes that catalyze lactonase and related hydrolytic reactions, and the so-called strictosidine synthase-like (SSL) subgroup. Metal-coordinating residues were identified as broadly conserved in the active sites of all three subgroups except for a few proteins from the SSL subgroup, which have been experimentally determined to catalyze the quite different strictosidine synthase (SS) reaction, a metal-independent condensation reaction. Despite these differences, comparison of conserved catalytic features of the arylesterase-like and SGL enzymes with the SSs identified similar structural and mechanistic attributes between the hydrolytic reactions catalyzed by the former and the condensation reaction catalyzed by SS. The results also suggest that despite their annotations, the great majority of these >500 SSL sequences do not catalyze the SS reaction; rather, they likely catalyze hydrolytic reactions typical of the other two subgroups instead. This prediction was confirmed experimentally for one of these proteins.
Published: 2011

46. Mutations in PNKD causing paroxysmal dyskinesia alters protein cleavage and stability

Author: Hsien-Yang Lee, Sunil Ojha, Ying-Hui Fu, Joel M. Rawson, Patricia C. Babbitt, Louis J. Ptáček, and Yiguo Shen
Subjects: Mutant, Fluorescent Antibody Technique, Muscle Proteins, Medical and Health Sciences, Transgenic, Mice, 0302 clinical medicine, 2.1 Biological and endogenous factors, Luciferases, Cells, Cultured, Genetics (clinical), Cellular localization, Genetics & Heredity, 0303 health sciences, Cultured, Blotting, Protein Stability, Articles, General Medicine, Biological Sciences, Cell biology, PNKD, Biochemistry, Knockout mouse, Drosophila, Western, Genetically modified mouse, Protein Structure, Paroxysmal nonkinesigenic dyskinesia, Cells, Transgene, Blotting, Western, Mutation, Missense, Mice, Transgenic, Biology, beta-Lactamases, 03 medical and health sciences, Rare Diseases, Chorea, Genetics, medicine, Animals, Immunoprecipitation, Molecular Biology, 030304 developmental biology, Paroxysmal dyskinesia, medicine.disease, Protein Structure, Tertiary, Mutation, Generic health relevance, Missense, Tertiary, 030217 neurology & neurosurgery
Abstract: Paroxysmal non-kinesigenic dyskinesia (PNKD) is a rare autosomal dominant movement disorder triggered by stress, fatigue or consumption of either alcohol or caffeine. Attacks last 1-4 h and consist of dramatic dystonia and choreoathetosis in the limbs, trunk and face. The disease is associated with single amino acid changes (A7V or A9V) in PNKD, a protein of unknown function. Here we studied the stability, cellular localization and enzymatic activity of the PNKD protein in cultured cells and transgenic animals. The N-terminus of the wild-type (WT) long PNKD isoform (PNKD-L) undergoes a cleavage event in vitro, resistance to which is conferred by disease-associated mutations. Mutant PNKD-L protein is degraded faster than the WT protein. These results suggest that the disease mutations underlying PNKD may disrupt protein processing in vivo, a hypothesis supported by our observation of decreased cortical Pnkd-L levels in mutant transgenic mice. Pnkd is homologous to a superfamily of enzymes with conserved β-lactamase domains. It shares highest homology with glyoxalase II but does not catalyze the same reaction. Lower glutathione levels were found in cortex lysates from Pnkd knockout mice versus WT littermates. Taken together, our results suggest an important role for the Pnkd protein in maintaining cellular redox status.
Published: 2011

47. New computational approaches to understanding molecular protein function

Author: Patricia C. Babbitt and Jacquelyn S. Fetrow
Subjects: Models, Molecular, Protein Structure Comparison, 0301 basic medicine, Computer science, Biochemistry, Database and Informatics Methods, Catalytic Domain, Macromolecular Structure Analysis, Cluster Analysis, Database Searching, Biology (General), Databases, Protein, Peptide sequence, Protein function, Ecology, Organic Compounds, Gene ontology, Gene Ontologies, Genomics, Genome project, Enzymes, Chemistry, Editorial, Computational Theory and Mathematics, Modeling and Simulation, Physical Sciences, Sequence Analysis, Multiple Alignment Calculation, Protein Structure, Bioinformatics, QH301-705.5, Sequence Databases, Sequence alignment, Computational biology, Research and Analysis Methods, 03 medical and health sciences, Cellular and Molecular Neuroscience, Text mining, Sequence Motif Analysis, Molecular evolution, Computational Techniques, Genetics, Amino Acid Sequence, Sequence Similarity Searching, Molecular Biology, Protein structure comparison, Ecology, Evolution, Behavior and Systematics, Terpenes, business.industry, Organic Chemistry, Chemical Compounds, Computational Biology, Proteins, Biology and Life Sciences, Genome Analysis, Split-Decomposition Method, Gene Ontology, Biological Databases, 030104 developmental biology, Enzymology, business, Sequence Alignment
Published: 2018

48. Evolutionary constraints on structural similarity in orthologs and paralogs

Author: Feng Chen, Patricia C. Babbitt, David S. Roos, Jeffery G. Saven, Mark Erik Peterson, and Andrej Sali
Subjects: Genetics, animal structures, Structural similarity, fungi, Proteins, Sequence alignment, Biology, Biochemistry, Article, Protein Structure, Tertiary, Structural genomics, Evolution, Molecular, Structure-Activity Relationship, Protein sequencing, Similarity (network science), Evolutionary biology, Amino Acid Sequence, Homology modeling, Databases, Protein, Sequence Alignment, Molecular Biology, Peptide sequence, Sequence (medicine)
Abstract: Although a quantitative relationship between sequence similarity and structural similarity has long been established, little is known about the impact of orthology on the relationship between protein sequence and structure. Among homologs, orthologs (derived by speciation) more frequently have similar functions than paralogs (derived by duplication). Here, we hypothesize that an orthologous pair will tend to exhibit greater structural similarity than a paralogous pair at the same level of sequence similarity. To test this hypothesis, we used 284,459 pairwise structure-based alignments of 12,634 unique domains from SCOP as well as orthology and paralogy assignments from OrthoMCL DB. We divided the comparisons by sequence identity and determined whether the sequence-structure relationship differed between the orthologs and paralogs. We found that at levels of sequence identity between 30 and 70%, orthologous domain pairs indeed tend to be significantly more structurally similar than paralogous pairs at the same level of sequence identity. An even larger difference is found when comparing ligand binding residues instead of whole domains. These differences between orthologs and paralogs are expected to be useful for selecting template structures in comparative modeling and target proteins in structural genomics.
Published: 2009

49. Biophysical studies support a predicted superhelical structure with armadillo repeats for Ric-8

Author: José Martínez-Oyanedel, Patricia C. Babbitt, María Victoria Hinrichs, Marta Bunster, Juan Olate, and Maximiliano Figueroa
Subjects: Circular dichroism, Computational biology, Biology, Biochemistry, Molecular biology, Protein structure, hemic and lymphatic diseases, Heterotrimeric G protein, biology.animal, Armadillo repeats, Armadillo, Protein folding, Threading (protein sequence), Molecular Biology, Alpha helix
Abstract: Ric-8 is a highly conserved cytosolic protein (MW 63 KDa) initially identified in C. elegans as an essential factor in neurotransmitter release and asymmetric cell division. Two different isoforms have been described in mammals, Ric-8A and Ric-8B; each possess guanine nucleotide exchange activity (GEF) on heterotrimeric G-proteins, but with different Gα subunits specificities. To gain insight on the mechanisms involved in Ric-8 cellular functions it is essential to obtain some information about its structure. Therefore, the aim of this work was to create a structural model for Ric-8. In this case, it was not possible to construct a model based on comparison with a template structure because Ric-8 does not present sequence similarity with any other protein. Consequently, different bioinformatics approaches that include protein folding and structure prediction were used. The Ric-8 structural model is composed of 10 armadillo folding motifs, organized in a right-twisted α-alpha super helix. In order to validate the structural model, a His-tag fusion construct of Ric-8 was expressed in E. coli, purified by affinity and anion exchange chromatography and subjected to circular dichroism analysis (CD) and thermostability studies. Ric-8 is approximately 80% alpha helix, with a Tm of 43.1°C, consistent with an armadillo-type structure such as α-importin, a protein composed of 10 armadillo repeats. The proposed structural model for Ric-8 is intriguing because armadillo proteins are known to interact with multiple partners and participate in diverse cellular functions. These results open the possibility of finding new protein partners for Ric-8 with new cellular functions.
Published: 2009

50. Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies

Author: John A. Gerlt, Matthew P. Jacobson, J. Michael Sauder, Steven C. Almo, Narayanan Eswar, Xiaojing Zheng, Patricia C. Babbitt, Ursula Pieper, Jennifer L. Seffernick, Andrej Sali, Shoshana D. Brown, Margaret E. Glasner, Mark R. Chance, Libusha Kelly, Ranyee A. Chiang, Subramanyam Swaminathan, Jeffrey B. Bonanno, Stephen K. Burley, and Frank M. Raushel
Subjects: Protein Folding, Bioinformatics, Protein Conformation, Enolase, Structural genomics, Target selection, Isomerase, 010402 general chemistry, Structure annotation, 01 natural sciences, Biochemistry, Genome, Article, Amidohydrolases, Substrate Specificity, 03 medical and health sciences, Structural Biology, Plant Genetics & Genomics, TIM barrel, Genetics, Databases, Protein, 030304 developmental biology, 0303 health sciences, Binding Sites, biology, Amidohydrolase, Enolase superfamily, Mandelate racemase, Animal Genetics and Genomics, Life Sciences, Computational Biology, Human Genetics, General Medicine, Genomics, 0104 chemical sciences, Biochemistry, general, Phosphopyruvate Hydratase, biology.protein, Amidohydrolase and enolase superfamilies, Microbial Genetics and Genomics
Abstract: To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics (NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date, 20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence–structure–function relationships. Electronic supplementary material The online version of this article (doi:10.1007/s10969-008-9056-5) contains supplementary material, which is available to authorized users.
Published: 2009

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

189 results on '"Patricia C. Babbitt"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources