1,021 results on '"Structural bioinformatics"'
Search Results
2. ASGARD. A simple and automatic GROMACS tool to analyze Molecular Dynamic simulations
- Author
-
Alejandro Rodríguez-Martínez, Jochem Nelen, Miguel Carmena-Bargueño, Carlos Martínez-Cortés, Irene Luque, and Horacio Pérez-Sánchez
- Subjects
Molecular Modeling ,GROMACS ,Molecular Dynamics ,Structural Bioinformatics ,Analysis - Abstract
One of the most featured techniques to analyze the physical movement of molecules is Molecular Dynamics (MD) simulations. Among the different programs used to perform them, GROMACS is one of the most widely used open-source packages. We present ASGARD, a software to perform analysis of MD protein or protein-ligand complex simulations and to generate the corresponding report via an automated MD workflow. This tool automatically generates a set of analyses after completing the simulation, including reports about overall system stability and system flexibility analysis with RMSD Fluctuation and Distribution calculations. Finally, a dynamic analysis with SASA and DSSP method graphs, and different interaction analyses are performed. In conclusion, ASGARD allows the user to run MD simulation analysis with a single command line instead of using several GROMACS programs for each analysis and analyze their result individually. Following this, it automatically creates an analysis report which helps to understand the molecular interactions and structural changes. The ASGARD tool generated these data files (xvg, png and PDF report files). Also, the output files generated by GROMACS MD (trajectory files, topology files and other types of files) used as the input folder in the three cases are uploaded.
- Published
- 2023
3. Structural Cheminformatics for Kinase-Centric Drug Design
- Author
-
Sydow, Dominique
- Subjects
kinases ,drug design ,binding sites ,open science ,cheminformatics ,target prediction ,structural bioinformatics ,500 Naturwissenschaften und Mathematik::540 Chemie::540 Chemie und zugeordnete Wissenschaften ,fragment-based drug design - Abstract
Drug development is a long, expensive, and iterative process with a high failure rate, while patients wait impatiently for treatment. Kinases are one of the main drug targets studied for the last decades to combat cancer, the second leading cause of death worldwide. These efforts resulted in a plethora of structural, chemical, and pharmacological kinase data, which are collected in the KLIFS database. In this thesis, we apply ideas from structural cheminformatics to the rich KLIFS dataset, aiming to provide computational tools that speed up the complex drug discovery process. We focus on methods for target prediction and fragment-based drug design that study characteristics of kinase binding sites (also called pockets). First, we introduce the concept of computational target prediction, which is vital in the early stages of drug discovery. This approach identifies biological entities such as proteins that may (i) modulate a disease of interest (targets or on-targets) or (ii) cause unwanted side effects due to their similarity to on-targets (off-targets). We focus on the research field of binding site comparison, which lacked a freely available and efficient tool to determine similarities between the highly conserved kinase pockets. We fill this gap with the novel method KiSSim, which encodes and compares spatial and physicochemical pocket properties for all kinases (kinome) that are structurally resolved. We study kinase similarities in the form of kinome-wide phylogenetic trees and detect expected and unexpected off-targets. To allow multiple perspectives on kinase similarity, we propose an automated and production-ready pipeline; user-defined kinases can be inspected complementarily based on their pocket sequence and structure (KiSSim), pocket-ligand interactions, and ligand profiles. Second, we introduce the concept of fragment-based drug design, which is useful to identify and optimize active and promising molecules (hits and leads). This approach identifies low-molecular-weight molecules (fragments) that bind weakly to a target and are then grown into larger high-affinity drug-like molecules. With the novel method KinFragLib, we provide a fragment dataset for kinases (fragment library) by viewing kinase inhibitors as combinations of fragments. Kinases have a highly conserved pocket with well-defined regions (subpockets); based on the subpockets that they occupy, we fragment kinase inhibitors in experimentally resolved protein-ligand complexes. The resulting dataset is used to generate novel kinase-focused molecules that are recombinations of the previously fragmented kinase inhibitors while considering their subpockets. The KinFragLib and KiSSim methods are published as freely available Python tools. Third, we advocate for open and reproducible research that applies FAIR principles ---data and software shall be findable, accessible, interoperable, and reusable--- and software best practices. In this context, we present the TeachOpenCADD platform that contains pipelines for computer-aided drug design. We use open source software and data to demonstrate ligand-based applications from cheminformatics and structure-based applications from structural bioinformatics. To emphasize the importance of FAIR data, we dedicate several topics to accessing life science databases such as ChEMBL, PubChem, PDB, and KLIFS. These pipelines are not only useful to novices in the field to gain domain-specific skills but can also serve as a starting point to study research questions. Furthermore, we show an example of how to build a stand-alone tool that formalizes reoccurring project-overarching tasks: OpenCADD-KLIFS offers a clean and user-friendly Python API to interact with the KLIFS database and fetch different kinase data types. This tool has been used in this thesis and beyond to support kinase-focused projects. We believe that the FAIR-based methods, tools, and pipelines presented in this thesis (i) are valuable additions to the toolbox for kinase research, (ii) provide relevant material for scientists who seek to learn, teach, or answer questions in the realm of computer-aided drug design, and (iii) contribute to making drug discovery more efficient, reproducible, and reusable.
- Published
- 2023
- Full Text
- View/download PDF
4. An Overview of the Curcumin-Based and Allicin Bioactive Compounds as potential treatment to SARS-CoV-2 with structural bioinformatics tools
- Author
-
Axel Fortunio Shiloputra, Arli Aditya Parikesit, Viona Pricillia, Dora Dayu Rahma Turista, Arif Nur Muhammad Ansori, and Jeremie Theddy Darmawan
- Subjects
Structural bioinformatics ,Proteases ,Drug discovery ,biology.protein ,medicine ,RNA-dependent RNA polymerase ,RNA ,Computational biology ,Biology ,medicine.disease_cause ,Polymerase ,Virus ,Coronavirus - Abstract
The recent outbreak of SARS-CoV-2 across the globe and the absence of a specific cure against the disease lead the scientific community to investigate some alternative indigenous treatments. SARS-CoV-2 is the virus responsible for the coronavirus ailment 2019 (COVID-19). This virus has 4 auxiliary proteins namely the S (spike), E (envelope), M (membrane), and N (nucleocapsid) proteins. The main proteases and RNA dependent RNA polymerase are also essential structures by which the virus replicates and survives. Each of these proteins are structures of the virus that are potential targets for drugs which are leads in the drug discovery process of any drug for the virus. Currently available treatments are not specific to the disease and therefore carry unwanted adverse effects that can be highly dangerous and sometimes fatal. Many of these treatments are supplementary in nature or based on repurposed drugs from other viral outbreaks. Alternatives of conventional drugs are required to control the spread and severity of the disease. Allicin, curcumin and their derivatives have been researched for their antiviral property and shown to have good binding affinity towards SARS-CoV-2 structures essential in their survival, especially the main proteases and RNA dependent RNA polymerases. The structural bioinformatics tools have elicited methods to predict the bioactivity of the natural product-based compounds. Apart from the beneficial medication that they offer, natural products carry along other advantages for the current pandemic situation in terms of supply, logistics, and affordability.
- Published
- 2021
5. Limits and potential of combined folding and docking
- Author
-
Petras J. Kundrotas, Gabriele Pozzati, Arne Elofsson, Claudio Bassot, John Lamb, and Wensi Zhu
- Subjects
Statistics and Probability ,Alternative methods ,Supplementary data ,AcademicSubjects/SCI01060 ,Computer science ,business.industry ,Pipeline (computing) ,Deep learning ,Folding (DSP implementation) ,computer.software_genre ,Original Papers ,Structural Bioinformatics ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Docking (dog) ,Computational Theory and Mathematics ,De novo protein structure prediction ,DOCK ,Data mining ,Artificial intelligence ,business ,Molecular Biology ,computer - Abstract
Motivation In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilising deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSAs). The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein–protein interfaces. However, most earlier studies have not used the latest DL methods for inter-chain contact distance prediction. This article introduces a fold-and-dock method based on predicted residue-residue distances with trRosetta. Results The method can simultaneously predict the tertiary and quaternary structure of a protein pair, even when the structures of the monomers are not known. The straightforward application of this method to a standard dataset for protein–protein docking yielded limited success. However, using alternative methods for generating MSAs allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods. Moreover, the results of conventional and fold-and-dock approaches are complementary, and thus a combined docking pipeline could increase overall docking success significantly. This methodology contributed to the best model for one of the CASP14 oligomeric targets, H1065. Availability and implementation All scripts for predictions and analysis are available from https://github.com/ElofssonLab/bioinfo-toolbox/ and https://gitlab.com/ElofssonLab/benchmark5/. All models joined alignments, and evaluation results are available from the following figshare repository https://doi.org/10.6084/m9.figshare.14654886.v2. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
6. <scp>RCSB</scp> Protein Data Bank: Celebrating 50 years of the <scp>PDB</scp> with new tools for understanding and visualizing biological macromolecules in <scp>3D</scp>
- Author
-
Stephen K. Burley, Zukang Feng, John D. Westbrook, Andrej Sali, Justin W Flatt, Rachel Kramer Green, Chenghua Shao, Sai J. Ganesan, Sutapa Ghosh, Brinda Vallat, David S. Goodsell, Jeremy Henry, Christine Zardecki, Joan Segura, Ezra Peisach, Charmi Bhikadiya, Catherine L. Lawson, Jose M. Duarte, Brian P. Hudson, Irina Persikova, Chunxiao Bi, Gregg V. Crichlow, Robert Lowe, Monica Sekharan, Jasmine Young, Shamara Whetstone, Li Chen, Vladimir Guranovic, Yu-He Liang, Dennis W Piehl, Maryam Fayazi, Maria Voigt, Sebastian Bittrich, Shuchismita Dutta, and Yana Rose
- Subjects
Tools for Protein Science ,Computer science ,Macromolecular crystallography ,Protein Data Bank (RCSB PDB) ,Computational Biology ,Effective management ,computer.file_format ,Biomolecular structure ,History, 20th Century ,Collaboratory ,Protein Data Bank ,History, 21st Century ,Biochemistry ,World Wide Web ,Anniversaries and Special Events ,User-Computer Interface ,Structural bioinformatics ,Experimental methods ,Databases, Protein ,Molecular Biology ,computer - Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), funded by the US National Science Foundation, National Institutes of Health, and Department of Energy, has served structural biologists and Protein Data Bank (PDB) data consumers worldwide since 1999. RCSB PDB, a founding member of the Worldwide Protein Data Bank (wwPDB) partnership, is the US data center for the global PDB archive housing biomolecular structure data. RCSB PDB is also responsible for the security of PDB data, as the wwPDB-designated Archive Keeper. Annually, RCSB PDB serves tens of thousands of three-dimensional (3D) macromolecular structure data depositors (using macromolecular crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and micro-electron diffraction) from all inhabited continents. RCSB PDB makes PDB data available from its research-focused RCSB.org web portal at no charge and without usage restrictions to millions of PDB data consumers working in every nation and territory worldwide. In addition, RCSB PDB operates an outreach and education PDB101.RCSB.org web portal that was used by more than 800,000 educators, students, and members of the public during calendar year 2020. This invited Tools Issue contribution describes (i) how the archive is growing and evolving as new experimental methods generate ever larger and more complex biomolecular structures; (ii) the importance of data standards and data remediation in effective management of the archive and facile integration with more than 50 external data resources; and (iii) new tools and features for 3D structure analysis and visualization made available during the past year via the RCSB.org web portal.
- Published
- 2021
7. <scp>PDB</scp> ‐101: Educational resources supporting molecular explorations through biology and medicine
- Author
-
Robert Lowe, Shuchismita Dutta, Christine Zardecki, Maria Voigt, Stephen K. Burley, and David S. Goodsell
- Subjects
Proteomics ,Web analytics ,Tools for Protein Science ,Protein Conformation ,business.industry ,Protein Data Bank (RCSB PDB) ,Proteins ,computer.file_format ,Collaboratory ,Crystallography, X-Ray ,Protein Data Bank ,Biochemistry ,World Wide Web ,Microscopy, Electron ,Structural bioinformatics ,Resource (project management) ,Structural biology ,Animals ,Humans ,Databases, Protein ,business ,Nuclear Magnetic Resonance, Biomolecular ,Molecular Biology ,Curriculum ,computer - Abstract
The Protein Data Bank (PDB) archive is a rich source of information in the form of atomic-level 3D structures of biomolecules experimentally determined using macromolecular crystallography (MX), nuclear magnetic resonance (NMR) spectroscopy, and electron microscopy (3DEM). Originally established in 1971 as a resource for protein crystallographers to freely exchange data, today PDB data drive research and education across scientific disciplines. In 2011, the online portal PDB-101 was launched to support teachers, students, and the general public in PDB archive exploration (pdb101.rcsb.org). Maintained by the Research Collaboratory for Structural Bioinformatics PDB, PDB-101 aims to help train the next generation of PDB users and to promote the overall importance of structural biology and protein science to non-experts. Regularly published features include the highly popular Molecule of the Month series, 3D model activities, molecular animation videos, and educational curricula. Materials are organized into various categories (Health and Disease, Molecules of Life, Biotech and Nanotech, and Structures and Structure Determination) and searchable by keyword. A biennial health focus frames new resource creation and provides topics for annual video challenges for high school students. Web analytics document that PDB-101 materials relating to fundamental topics (e.g., hemoglobin, catalase) are highly accessed year-on-year. In addition, PDB-101 materials created in response to topical health matters (e.g., Zika, measles, coronavirus) are well-received. PDB-101 shows how learning about the diverse shapes and functions of PDB structures promotes understanding of all aspects of biology, from central dogma of biology to health and disease to biological energy. This article is protected by copyright. All rights reserved.
- Published
- 2021
8. Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity
- Author
-
Ernst Freund, Dan Huynh, Richard A. Lewis, Eric J. Ma, Markus Stoeckli, Elke Koch, Holger Schlingensiepen, Markus Vogel, Mathieu Ligibel, Radka Snajdrova, Charles Moore, Elina Maria Siirola, Luca Siegrist, Caroline Bouquet, Michael Faller, Edward J. Oakeley, Geoffrey Cutler, Anne-Christine Acker, Fabian K. Eggimann, and Arkadij Kummer
- Subjects
chemistry.chemical_compound ,Structural bioinformatics ,chemistry ,Stereochemistry ,Biocatalysis ,Imine ,Stereoselectivity ,General Chemistry ,Directed Molecular Evolution ,Reductase ,Directed evolution ,Catalysis - Published
- 2021
9. Design of Inhibitory Peptides of the Interaction between the RBD Domain of the S1 Protein of SARS-CoV-2 and the Angiotensin-Converting Enzyme 2 (ACE2) Receptor
- Author
-
Carlos Andrés Rodríguez-Salazar, Delia Piedad Recalde-Reyes, and Jhon Carlos Castaño-Osorio
- Subjects
Viral protein ,Chemistry ,Pharmaceutical Science ,Inhibitory postsynaptic potential ,medicine.disease_cause ,Protein–protein interaction ,Alveolar cells ,Structural bioinformatics ,medicine.anatomical_structure ,Biochemistry ,Drug Discovery ,Angiotensin-converting enzyme 2 ,medicine ,Molecular Medicine ,Cytotoxic T cell ,Receptor - Abstract
Background: The recent outbreak caused by SARS-CoV-2, known as COVID-19, has been cataloged as a global catastrophe due to the growing number of infected cases and deaths since November 2019; this infectious, contagious disease, to date, does not have a vaccine or specific treatment available, which is why the number of cases continues to increase. SARS-CoV-2 infects humans as a result of the interaction between the receptor-binding domain of the viral spike protein and the receptor of the angiotensin-converting enzyme-2 (rACE2), located predominantly in the alveolar cells. Objective: This study aims to identify inhibitory peptides of the protein-protein interaction between the receptor-binding-domain of the spike protein of SARS-CoV-2 and the angiotensin-converting enzyme-2 receptor through computational tools. Methods: Through the Research Collaboratory for Structural Bioinformatics protein database, crystals were selected and interaction models were carried out between the viral protein and the ACE2; thereafter, the study designed inhibitory peptides of the interaction through the Rosetta web server, validated their interaction through ClusPro and, finally, determined the theoretical physicochemical and cytotoxic properties. Results: A protein complex was generated and modeled through ClusPro; the balanced model was selected with the lowest binding energy. From the protein interactions of each of the crystals and from the model, eight peptides of 20 residues were obtained. The theoretical evaluation showed non-toxic peptides, six soluble in water, and two insoluble. Conclusion: We found eight peptides interacted with the receptor-binding-domain of the Spike Protein of SARS-CoV-2, which could avoid contact with the cell receptor and generate interference in the infection process.
- Published
- 2021
10. OPUS-X: an open-source toolkit for protein torsion angles, secondary structure, solvent accessibility, contact map predictions and 3D folding
- Author
-
Gang Xu, Qinghua Wang, and Jianpeng Ma
- Subjects
Statistics and Probability ,AcademicSubjects/SCI01060 ,Computer science ,Orientation (computer vision) ,Folding (DSP implementation) ,Python (programming language) ,Original Papers ,Structural Bioinformatics ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Torsion (algebra) ,Code (cryptography) ,Protein folding ,Molecular Biology ,Algorithm ,Protocol (object-oriented programming) ,Protein secondary structure ,computer ,computer.programming_language - Abstract
Motivation The development of an open-source platform to predict protein 1D features and 3D structure is an important task. In this paper, we report an open-source toolkit for protein 3D structure modeling, named OPUS-X. It contains three modules: OPUS-TASS2, which predicts protein torsion angles, secondary structure and solvent accessibility; OPUS-Contact, which measures the distance and orientation information between different residue pairs; and OPUS-Fold2, which uses the constraints derived from the first two modules to guide folding. Results OPUS-TASS2 is an upgraded version of our previous method OPUS-TASS. OPUS-TASS2 integrates protein global structure information and significantly outperforms OPUS-TASS. OPUS-Contact combines multiple raw co-evolutionary features with protein 1D features predicted by OPUS-TASS2, and delivers better results than the open-source state-of-the-art method trRosetta. OPUS-Fold2 is a complementary version of our previous method OPUS-Fold. OPUS-Fold2 is a gradient-based protein folding framework based on the differentiable energy terms in opposed to OPUS-Fold that is a sampling-based method used to deal with the non-differentiable terms. OPUS-Fold2 exhibits comparable performance to the Rosetta folding protocol in trRosetta when using identical inputs. OPUS-Fold2 is written in Python and TensorFlow2.4, which is user-friendly to any source-code-level modification. Availabilityand implementation The code and pre-trained models of OPUS-X can be downloaded from https://github.com/OPUS-MaLab/opus_x. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
11. Intrinsic protein disorder and conditional folding in AlphaFoldDB
- Author
-
Damiano Piovesan, Alexander Miguel Monzon, and Silvio C. E. Tosatto
- Subjects
Intrinsically Disordered Proteins ,Protein Folding ,machine learning ,CAID ,Protein Conformation ,critical assessment ,structural bioinformatics ,protein structure ,Molecular Biology ,Biochemistry ,intrinsically disordered proteins - Abstract
Intrinsically disordered regions (IDRs) defying the traditional protein structure-function paradigm have been difficult to analyze. The availability of accurate structure predictions on a large scale in AlphaFoldDB offers a fresh perspective on IDR prediction. Here, we establish three baselines for IDR prediction from AlphaFoldDB models based on the recent CAID dataset. Surprisingly, AlphaFoldDB is highly competitive for predicting both IDRs and conditionally folded binding regions, demonstrating the plasticity of the disorder to structure continuum.
- Published
- 2022
12. Accurate prediction of protein structures and interactions using a three-track neural network
- Author
-
Jose Henrique Pereira, Ana C. Ebrecht, Lisa N. Kinch, R. Dustin Schaeffer, Ivan Anishchenko, Justas Dauparas, Udit Dalwadi, Gyu Rie Lee, Christoph Buhlheller, Diederik J. Opperman, David Baker, Tea Pavkov-Keller, Qian Cong, Caleb R. Glassman, Alberdina A. van Dijk, Jue Wang, Andria V. Rodrigues, Theo Sagmeister, Randy J. Read, Andy DeGiovanni, Hahnbeom Park, Paul D. Adams, Calvin K. Yip, Frank DiMaio, John E. Burke, Claudia Millán, K. Christopher Garcia, Carson Adams, Minkyung Baek, Nick V. Grishin, Sergey Ovchinnikov, and Manoj K. Rathinaswamy
- Subjects
Structure (mathematical logic) ,0303 health sciences ,Sequence ,Network architecture ,Multidisciplinary ,Artificial neural network ,business.industry ,Computer science ,Deep learning ,computer.software_genre ,Modeling and simulation ,03 medical and health sciences ,Structural bioinformatics ,0302 clinical medicine ,Data mining ,Artificial intelligence ,business ,Distance transform ,computer ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
- Published
- 2021
13. DAMA: a method for computing multiple alignments of protein structures using local structure descriptors
- Author
-
Bogdan Lesyng, Tymoteusz Oleniecki, and Paweł Daniluk
- Subjects
Statistics and Probability ,Similarity (geometry) ,Multiple sequence alignment ,AcademicSubjects/SCI01060 ,Heuristic (computer science) ,Computer science ,Structural alignment ,Similarity measure ,computer.software_genre ,Original Papers ,Structural Bioinformatics ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Identification (information) ,Computational Theory and Mathematics ,Complete information ,Data mining ,Protein topology ,Molecular Biology ,computer - Abstract
Motivation The well-known fact that protein structures are more conserved than their sequences forms the basis of several areas of computational structural biology. Methods based on the structure analysis provide more complete information on residue conservation in evolutionary processes. This is crucial for the determination of evolutionary relationships between proteins and for the identification of recurrent structural patterns present in biomolecules involved in similar functions. However, algorithmic structural alignment is much more difficult than multiple sequence alignment. This study is devoted to the development and applications of DAMA—a novel effective environment capable to compute and analyze multiple structure alignments. Results DAMA is based on local structural similarities, using local 3D structure descriptors and thus accounts for nearest-neighbor molecular environments of aligned residues. It is constrained neither by protein topology nor by its global structure. DAMA is an extension of our previous study (DEDAL) which demonstrated the applicability of local descriptors to pairwise alignment problems. Since the multiple alignment problem is NP-complete, an effective heuristic approach has been developed without imposing any artificial constraints. The alignment algorithm searches for the largest, consistent ensemble of similar descriptors. The new method is capable to capture most of the biologically significant similarities present in canonical test sets and is discriminatory enough to prevent the emergence of larger, but meaningless, solutions. Tests performed on the test sets, including protein kinases, demonstrate DAMA’s capability of identifying equivalent residues, which should be very useful in discovering the biological nature of proteins similarity. Performance profiles show the advantage of DAMA over other methods, in particular when using a strict similarity measure QC, which is the ratio of correctly aligned columns, and when applying the methods to more difficult cases. Availability and implementation DAMA is available online at http://dworkowa.imdik.pan.pl/EP/DAMA. Linux binaries of the software are available upon request. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
14. Computational Modeling Methods for <scp>3D</scp> Structure Prediction of Ribozymes
- Author
-
Astha Joshi, Chandran Nithin, Janusz M. Bujnicki, Tomasz K Wirecki, Filip Stefaniak, and Pritha Ghosh
- Subjects
Structural bioinformatics ,biology ,Modelling methods ,Computer science ,Ribozyme ,biology.protein ,Structure (category theory) ,Computational biology - Published
- 2021
15. <scp>Ligand Binding Site Comparison</scp> — <scp>LiBiSCo</scp> — a web‐based tool for analyzing interactions between proteins and ligands to explore amino acid specificity within active sites
- Author
-
Sameer Hassan, Henrik Aronsson, and Mats Töpel
- Subjects
Stereochemistry ,Protein Data Bank (RCSB PDB) ,Ligands ,Biochemistry ,Structural bioinformatics ,Protein structure ,Structural Biology ,Catalytic Domain ,Drug Discovery ,Humans ,Protein Interaction Domains and Motifs ,Binding site ,Databases, Protein ,Molecular Biology ,chemistry.chemical_classification ,Internet ,Chemistry ,Drug discovery ,Computational Biology ,Proteins ,computer.file_format ,Protein Data Bank ,Ligand (biochemistry) ,Amino acid ,computer ,Software ,Protein Binding - Abstract
Interaction between protein and ligands are ubiquitous in a biological cell, and understanding these interactions at the atom level in protein-ligand complexes is crucial for structural bioinformatics and drug discovery. Here, we present a web-based protein-ligand interaction application named Ligand Binding Site Comparison (LiBiSCo) for comparing the amino acid residues interacting with atoms of a ligand molecule between different protein-ligand complexes available in the Protein Data Bank (PDB) database. The comparison is performed at the ligand atom level irrespectively of having binding site similarity or not between the protein structures of interest. The input used in LiBiSCo is one or several PDB IDs of protein-ligand complex(es) and the tool returns a list of identified interactions at ligand atom level including both bonded and non-bonded interactions. A sequence profile for the interaction for each ligand atoms is provided as a WebLogo. The LiBiSco is useful in understanding ligand binding specificity and structural promiscuity among families that are structurally unrelated. The LiBiSCo tool can be accessed through https://albiorix.bioenv.gu.se/LiBiSCo/HomePage.py.
- Published
- 2021
16. Topology evaluation of models for difficult targets in the 14th round of the critical assessment of protein structure prediction (CASP14)
- Author
-
Jimin Pei, R. Dustin Schaeffer, Andriy Kryshtafovych, Lisa N. Kinch, and Nick V. Grishin
- Subjects
Models, Molecular ,Structure (mathematical logic) ,Protein Folding ,Protein Conformation ,Computer science ,media_common.quotation_subject ,Zhàng ,Computational Biology ,Proteins ,Protein structure prediction ,Topology ,Biochemistry ,Article ,Structural bioinformatics ,Ranking ,Sequence Analysis, Protein ,Structural Biology ,Quality (business) ,Databases, Protein ,CASP ,Molecular Biology ,Software ,Topology (chemistry) ,media_common - Abstract
This report describes the tertiary structure prediction assessment of difficult modeling targets in the 14(th) round of the Critical Assessment of Structure Prediction (CASP14). We implemented an official ranking scheme that used the same scores as the previous CASP topology-based assessment, but combined these scores with one that emphasized physically realistic models. The top performing AlphaFold2 group outperformed the rest of the prediction community on all but two of the difficult targets considered in this assessment. They provided high quality models for most of the targets (86% over GDT_TS 70), including larger targets above 150 residues, and they correctly predicted the topology of almost all the rest. AlphaFold2 performance was followed by two manual Baker methods, a Feig method that refined Zhang-server models, two notable automated Zhang server methods (QUARK and Zhang-server), and a Zhang manual group. Despite the remarkable progress in protein structure prediction of difficult targets, both the prediction community and AlphaFold2, to a lesser extent, faced challenges with flexible regions and obligate oligomeric assemblies. The official ranking of top-performing methods was supported by performance generated PCA and heatmap clusters that gave insight into target difficulties and the most successful state-of-the-art structure prediction methodologies.
- Published
- 2021
17. Strategies for drug target identification in Mycobacterium leprae
- Author
-
Marta Acebrón-García-de-Eulate, Tom L. Blundell, Sundeep Chaitanya Vedithi, Blundell, Tom [0000-0002-2708-8992], Vedithi, Sundeep Chaitanya [0000-0003-3474-4705], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,In silico ,Drug target ,Druggability ,Leprostatic Agents ,Disease ,Computational biology ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Bacterial Proteins ,Leprosy ,Drug Discovery ,medicine ,Animals ,Humans ,Computer Simulation ,Mycobacterium leprae ,Pharmacology ,Structure–activity relationship, drug target ,biology.organism_classification ,medicine.disease ,Hansen’s disease ,Gene Ontology ,030104 developmental biology ,Structural bioinformatics ,Drug Design ,030220 oncology & carcinogenesis ,Identification (biology) - Abstract
Hansen’s disease (HD), or leprosy, continues to be endemic in many parts of the world. Although multidrug therapy (MDT) is successful in curing a large number of patients, some of them abandon it because it is a long-term treatment. Therefore, identification of new drug targets in Mycobacterium leprae is considered of high importance. Here, we introduce an overview of in silico and in vitro studies that might be of help in this endeavor. The essentiality of M. leprae proteins is reviewed with discussion of flux balance analysis, gene expression, and knockout articles. Finally, druggability techniques are proposed for the validation of new M. leprae protein targets (see Fig. 1 ).
- Published
- 2021
18. Mathematical Multidimensional Modelling and Structural Artificial Intelligence Pipelines Provide Insights for the Designing of Highly Specific AntiSARS-CoV2 Agents
- Author
-
Panayiotis Vlamos and Dimitrios Vlachakis
- Subjects
Artificial intelligence ,2019-20 coronavirus outbreak ,Mathematical modelling ,Coronavirus disease 2019 (COVID-19) ,SARS-CoV-2 ,COVID19 ,Computer science ,business.industry ,Applied Mathematics ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,010102 general mathematics ,Chemical data ,0102 computer and information sciences ,01 natural sciences ,Article ,Drug design ,Computational Mathematics ,Structural bioinformatics ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,0101 mathematics ,business - Abstract
COVID19 is the most impactful pandemic of recent times worldwide. It is a highly infectious disease that is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2 virus), To date there is specific drug nor vaccination against COVID19. Therefor the need for novel and pioneering anti-COVID19 is of paramount importance. In this direction, computer-aided drug design constitutes a very promising antiviral approach for the discovery and analysis of drugs and molecules with biological activity against SARS-CoV2. In silico modelling takes advantage of the massive amounts of biological and chemical data available on the nature of the interactions between the targeted systems and molecules, as well as the rapid progress of computational tools and software. Herein, we describe the potential of the merging of mathematical modelling, artificial intelligence and learning techniques into seamless computational pipelines for the rapid and efficient discovery and design of potent anti- SARS-CoV-2 modulators.
- Published
- 2021
19. TOUGH-M2
- Author
-
Brylinski, Michal
- Subjects
Computational biology ,Structural bioinformatics ,Local pocket alignments ,Pocket matching ,Sequence order-independent alignments ,eFindSite ,Ligand binding ,Drug binding pockets - Abstract
Dataset to evaluate algorithms for binding site matching
- Published
- 2022
- Full Text
- View/download PDF
20. TOUGH-M1
- Author
-
Brylinski, Michal and Scott, Oliver
- Subjects
Structural bioinformatics ,Local pocket alignments ,Pocket matching ,Sequence order-independent alignments ,Computational Biology ,Fpocket ,Ligand binding ,Drug binding pockets - Abstract
Dataset to evaluate algorithms for binding site matching
- Published
- 2022
- Full Text
- View/download PDF
21. Structural Achievability of an NH-π Interaction between Gln and Phe in a Crystal Structure of a Collagen-like Peptide
- Author
-
Ruixue Zhang, You Xu, Jun Lan, Shilong Fan, Jing Huang, and Fei Xu
- Subjects
Solvents ,Collagen ,Peptides ,Molecular Biology ,Biochemistry ,Hydrophobic and Hydrophilic Interactions ,NH–π interactions ,side chain interactions ,X-ray crystallography ,collagen ,molecular dynamics ,quantum chemistry ,circular dichroism ,structural bioinformatics - Abstract
NH–π interactions between polar and aromatic residues are well distributed in proteins whose stabilizing effects have been investigated in globular and fibrous proteins. In order to gain structural insights into side chain NH–π interactions, we solved a crystal structure of a collagen-like peptide containing Gln-Phe pairs. The Gln-Phe NH–π interactions were further characterized by quantum calculations, molecular simulations, and structural bioinformatics. The analyses indicated that the NH–π interactions are robust under various solvent conditions, can be distributed either on the protein surface or in its hydrophobic core and can form at a wide range of distances between residues. This study suggested that NH–π interactions can play a versatile role in protein design, including engineering hydrophobic cores, solvent accessible surfaces, and protein–protein interfaces.
- Published
- 2022
22. Glucosamine-6-phosphate synthase de Mycobacterium tuberculosis um estudo in silico para predição de um modelo tridimensional refinado
- Author
-
Wildrimak S. Pereira, Ricardo Martins Ramos, Rômulo O. Barros, Jhonatan Matheus Sousa Costa, and Fabio Luis Cardoso Costa Junior
- Subjects
chemistry.chemical_classification ,Tuberculosis ,ATP synthase ,biology ,Uridine biosynthesis ,medicine.disease ,biology.organism_classification ,World health ,Mycobacterium tuberculosis ,Structural bioinformatics ,Enzyme ,chemistry ,Biochemistry ,medicine ,biology.protein ,Infectious agent - Abstract
Tuberculosis is one of the main causes of death by an infectious agent in the world, according to the World Health Organization. Studies indicate that enzymes involved in the biosynthesis of uridine diphosphate-N-acetylglucosamine are essential for the life cycle of the bacterium. One of these enzymes is glucosamine-6-phosphate synthase (GlmS), which does not have a three-dimensional structure available in the protein database on the internet. In this work, structural bioinformatics methods (comparative modeling and molecular refinement) were used to build a refined three-dimensional model for the GlmS enzyme of Mycobacterium tuberculosis. The model was generated using four templatestructures (crystallographic). The results obtained for the stereochemical and general parameters of the refined model were better than the original model and similarto those templatestructures, validating the refined model.
- Published
- 2021
23. Atomic-level evolutionary information improves protein–protein interface scoring
- Author
-
Raphael Guerois, Chloé Quignot, Pierre Granger, Pablo Chacón, Jessica Andreani, Institut de Biologie Intégrative de la Cellule (I2BC), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Institut des Sciences du Vivant Frédéric JOLIOT (JOLIOT), Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Instituto de Química Física Rocasolano (IQFR), Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), ANR-15-CE11-0008,CHIPSeT,Caracterisation Structurale des Couplages entre Chromatine et Homeostasie Protéique par Combinaison d'Analyses de Coevolution et de Perturbation des Interfaces de Complexes a Haut-Debit(2015), and ANR-18-CE45-0005,ESPRINet,Intégration de données hétérogènes évolutives, structurales et omiques pour la prédiction des réseaux d'interaction protéine-ARN(2018)
- Subjects
Statistics and Probability ,Computer science ,Interface (computing) ,protein-protein interactions ,Machine learning ,computer.software_genre ,Biochemistry ,Protein–protein interaction ,03 medical and health sciences ,0302 clinical medicine ,[SDV.BBM]Life Sciences [q-bio]/Biochemistry, Molecular Biology ,Evolutionary information ,protein structure ,[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biochemistry [q-bio.BM] ,protein evolution ,Molecular Biology ,protein scoring ,030304 developmental biology ,0303 health sciences ,business.industry ,Protein protein ,030302 biochemistry & molecular biology ,Percentage point ,structural bioinformatics ,Benchmarking ,[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biomolecules [q-bio.BM] ,protein docking ,Computer Science Applications ,[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biophysics ,Computational Mathematics ,Computational Theory and Mathematics ,Docking (molecular) ,Complementarity (molecular biology) ,Container (abstract data type) ,Benchmark (computing) ,Artificial intelligence ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,business ,Statistical potential ,computer ,030217 neurology & neurosurgery - Abstract
Motivation The crucial role of protein interactions and the difficulty in characterizing them experimentally strongly motivates the development of computational approaches for structural prediction. Even when protein–protein docking samples correct models, current scoring functions struggle to discriminate them from incorrect decoys. The previous incorporation of conservation and coevolution information has shown promise for improving protein–protein scoring. Here, we present a novel strategy to integrate atomic-level evolutionary information into different types of scoring functions to improve their docking discrimination. Results We applied this general strategy to our residue-level statistical potential from InterEvScore and to two atomic-level scores, SOAP-PP and Rosetta interface score (ISC). Including evolutionary information from as few as 10 homologous sequences improves the top 10 success rates of individual atomic-level scores SOAP-PP and Rosetta ISC by 6 and 13.5 percentage points, respectively, on a large benchmark of 752 docking cases. The best individual homology-enriched score reaches a top 10 success rate of 34.4%. A consensus approach based on the complementarity between different homology-enriched scores further increases the top 10 success rate to 40%. Availability and implementation All data used for benchmarking and scoring results, as well as a Singularity container of the pipeline, are available at http://biodev.cea.fr/interevol/interevdata/. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
24. A Review on Design and Development of Performance Evaluation Model for Bio-Informatics Data Using Hadoop
- Author
-
Ravi Kumar A
- Subjects
Computer science ,Interface (Java) ,business.industry ,General Mathematics ,Distributed computing ,Cloud computing ,computer.file_format ,Asset (computer security) ,Education ,Computational Mathematics ,Structural bioinformatics ,Workflow ,Computational Theory and Mathematics ,Scalability ,Executable ,Cluster analysis ,business ,computer - Abstract
The paper reviews the usage of the platform Hadoop in applications for systemic bioinformatics. Hadoop offers another system for Structural Bioinformatics to break down broad fractions of the Protein Data Bank that is crucial to high-throughput investigations of (for example) protein-ligand docking, protein-ligand complex clustering, and structural alignment. In specific, we review different applications of high-throughput analyses and their scalability in the literature using Hadoop. In comparison to revising the algorithms, we find that these organisations typically use a realized executable called MapReduce. Scalability demonstrates variable behavior in correlation with other batch schedulers, particularly as immediate examinations are usually not accessible on a similar platform. Direct Hadoop examinations with batch schedulers are missing in the literature, but we note that there is some evidence that the scale of MPI executions is better than Hadoop. The dilemma of the interface and structure of an asset to use Hadoop is a significant obstacle to the utilization of the Hadoop biological framework. This will enhance additional time as Hadoop interfaces, such as enhancing Flash, increasing the use of cloud platforms, and normalized approaches, for example, are taken up by Workflow Languages.
- Published
- 2021
25. Biobox: a toolbox for biomolecular modelling
- Author
-
Justin L. P. Benesch, Matteo T. Degiacomi, Lucas S. P. Rudden, and Samuel C. Musson
- Subjects
Statistics and Probability ,AcademicSubjects/SCI01060 ,Computer science ,business.industry ,Python (programming language) ,Applications Notes ,Structural Bioinformatics ,Biochemistry ,Toolbox ,Computer Science Applications ,Computational Mathematics ,Software ,Computational Theory and Mathematics ,Modelling methods ,Software engineering ,business ,Molecular Biology ,computer ,computer.programming_language - Abstract
Motivation The implementation of biomolecular modelling methods and analyses can be cumbersome, often carried out with in-house software reimplementing common tasks, and requiring the integration of diverse software libraries. Results We present Biobox, a Python-based toolbox facilitating the implementation of biomolecular modelling methods. Availability and implementation Biobox is freely available on https://github.com/degiacom/biobox, along with its API and interactive Jupyter notebook tutorials.
- Published
- 2021
26. Reaction Pathway Sampling and Free-Energy Analyses for Multimeric Protein Complex Disassembly by Employing Hybrid Configuration Bias Monte Carlo/Molecular Dynamics Simulation
- Author
-
Ikuo Kurisaki and Shigenori Tanaka
- Subjects
Materials science ,Multiprotein complex ,Pentamer ,General Chemical Engineering ,Monte Carlo method ,General Chemistry ,Mass spectrometry ,Oligomer ,Article ,Dissociation (psychology) ,Chemistry ,Molecular dynamics ,chemistry.chemical_compound ,Structural bioinformatics ,chemistry ,medicine ,medicine.symptom ,Biological system ,QD1-999 - Abstract
Physicochemical characterization of multimeric biomacromolecule assembly and disassembly processes is a milestone to understand the mechanisms for biological phenomena at the molecular level. Mass spectroscopy (MS) and structural bioinformatics (SB) approaches have become feasible to identify subcomplexes involved in assembly and disassembly, while they cannot provide atomic information sufficient for free-energy calculation to characterize transition mechanism between two different sets of subcomplexes. To combine observations derived from MS and SB approaches with conventional free-energy calculation protocols, we here designed a new reaction pathway sampling method by employing hybrid configuration bias Monte Carlo/molecular dynamics (hcbMC/MD) scheme and applied it to simulate the disassembly process of serum amyloid P component (SAP) pentamer. The results we obtained are consistent with those of the earlier MS and SB studies with respect to SAP subcomplex species and the initial stage of SAP disassembly processes. Furthermore, we observed a novel dissociation event, ring-opening reaction of SAP pentamer. Employing free-energy calculation combined with the hcbMC/MD reaction pathway trajectories, we moreover obtained experimentally testable observations on (1) reaction time of the ring-opening reaction and (2) importance of Asp42 and Lys117 for stable formation of SAP oligomer.
- Published
- 2021
27. Generating property-matched decoy molecules using deep learning
- Author
-
Fergus Imrie, Anthony R. Bradley, and Charlotte M. Deane
- Subjects
Statistics and Probability ,AcademicSubjects/SCI01060 ,Property (programming) ,Computer science ,Machine learning ,computer.software_genre ,01 natural sciences ,Biochemistry ,03 medical and health sciences ,Molecular recognition ,Code (cryptography) ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Virtual screening ,business.industry ,Deep learning ,Pattern recognition ,Construct (python library) ,Original Papers ,Structural Bioinformatics ,0104 chemical sciences ,Computer Science Applications ,010404 medicinal & biomolecular chemistry ,Computational Mathematics ,Computational Theory and Mathematics ,Artificial intelligence ,Decoy ,business ,computer - Abstract
Motivation An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalization and hinders virtual screening method development. Results We have developed a deep learning method (DeepCoy) that generates decoys to a user’s preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules’ physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63. Availability and implementation The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
28. Folded domain charge properties influence the conformational behavior of disordered tails
- Author
-
Alex S. Holehouse and Ishan Taneja
- Subjects
Physics ,Intrinsically disordered proteins ,Sequence design ,QH301-705.5 ,Charge (physics) ,Context (language use) ,All-atom simulations ,Article ,Domain (software engineering) ,Structural bioinformatics ,Many core ,Structural Biology ,Chemical physics ,Conformational ensemble ,Surface charge ,Biology (General) ,Molecular Biology - Abstract
Intrinsically disordered proteins and protein regions (IDRs) make up around 30% of the human proteome where they play essential roles in dictating and regulating many core biological processes. While IDRs are often studied as isolated domains, in naturally occurring proteins most IDRs are found adjacent to folded domains, where they exist as either N- or C-terminal tails or as linkers connecting two folded domains. Prior work has shown that charge properties of IDRs can influence their conformational behavior, both in isolation and in the context of folded domains. In contrast, the converse scenario is less well-explored: how do the charge properties of folded domains influence IDR conformational behavior? To answer this question, we combined a large-scale structural bioinformatics analysis with all-atom implicit solvent simulations of both rationally designed and naturally occurring proteins. Our results reveal three key takeaways. Firstly, the relative position and accessibility of charged residues across the surface of a folded domain can dictate IDR conformational behavior, overriding expectations based on net surface charge properties. Secondly, naturally occurring proteins possess multiple charge patches that are physically accessible to local IDRs. Finally, even modest changes in the local electrostatic environment of a folded domain can substantially modulate IDR-folded domain interactions. Taken together, our results suggest that folded domain surfaces can act as local determinants of IDR conformational behavior., Graphical abstract Image 1, Highlights • Intrinsically disordered regions (IDRs) are mostly found adjacent to folded domains. • Here we propose that the folded domain surface properties influence IDR behavior. • We combine all-atom simulations and sequence design of IDRs and folded domains. • IDR conformational behavior is determined by a complex combination of factors. • Folded domains can substantially alter IDR conformational biases.
- Published
- 2021
29. Problem-oriented software package for the numerical solution and research of structural bioinformatics problems using stochastic optimization methods
- Author
-
N. Ershov and S. Poluyan
- Subjects
0303 health sciences ,03 medical and health sciences ,Mathematical optimization ,Structural bioinformatics ,Computer science ,030302 biochemistry & molecular biology ,Stochastic optimization ,Software package ,030304 developmental biology - Abstract
In this paper presented problem-oriented software package for performing computational experiments in structural bioinformatics problems: protein structure prediction and peptide-protein docking. These problemsare formulated as continuous global optimization tasks. The primary purpose of the presented software package is to provide functionality for performing computational experiments using various stochastic optimization methods. To perform experiments for the selected task the objective function and search space are provided for user. In this work the software packagefunctionality, implementation features and the results of various experimentsare presented. The software is written in C++ and provides the possibility ofusing parallel computing using OpenMP technology. The presented package is open source software that stored in the GitHub repositories.
- Published
- 2020
30. The impact of structural bioinformatics tools and resources on SARS-CoV-2 research and therapeutic strategies
- Author
-
Mihaly Varadi, Vincent Zoete, Sameer Velankar, Neeladri Sen, Antoine Daina, Shoshana J. Wodak, Christine A. Orengo, and Vaishali P Waman
- Subjects
AcademicSubjects/SCI01060 ,Coronavirus disease 2019 (COVID-19) ,Protein Conformation ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,protein 3D structures ,Disease ,Computational biology ,Biology ,Antiviral Agents ,Viral Proteins ,03 medical and health sciences ,Structural bioinformatics ,Protein structure ,therapeutics ,Humans ,Molecular Biology ,Host protein ,Method Review ,030304 developmental biology ,0303 health sciences ,SARS-CoV-2 ,030302 biochemistry & molecular biology ,COVID-19 ,Computational Biology ,structural bioinformatics ,computer.file_format ,Protein Data Bank ,structure prediction ,mutation/variation ,COVID-19 Drug Treatment ,3. Good health ,Structural biology ,computer ,Information Systems - Abstract
SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design.Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics.Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor–antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.
- Published
- 2020
31. RNA inter-nucleotide 3D closeness prediction by deep residual neural networks
- Author
-
Wenkai Wang, Saisai Sun, Jianyi Yang, and Zhenling Peng
- Subjects
Statistics and Probability ,AcademicSubjects/SCI01060 ,Computer science ,Closeness ,Residual ,Biochemistry ,03 medical and health sciences ,Nucleic acid structure ,Molecular Biology ,Protein secondary structure ,030304 developmental biology ,0303 health sciences ,Artificial neural network ,Nucleotides ,030302 biochemistry & molecular biology ,Computational Biology ,Contrast (statistics) ,Covariance ,Original Papers ,Structural Bioinformatics ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Test set ,RNA ,Neural Networks, Computer ,Sequence Alignment ,Algorithm ,Algorithms - Abstract
Motivation Recent years have witnessed that the inter-residue contact/distance in proteins could be accurately predicted by deep neural networks, which significantly improve the accuracy of predicted protein structure models. In contrast, fewer studies have been done for the prediction of RNA inter-nucleotide 3D closeness. Results We proposed a new algorithm named RNAcontact for the prediction of RNA inter-nucleotide 3D closeness. RNAcontact was built based on the deep residual neural networks. The covariance information from multiple sequence alignments and the predicted secondary structure were used as the input features of the networks. Experiments show that RNAcontact achieves the respective precisions of 0.8 and 0.6 for the top L/10 and L (where L is the length of an RNA) predictions on an independent test set, significantly higher than other evolutionary coupling methods. Analysis shows that about 1/3 of the correctly predicted 3D closenesses are not base pairings of secondary structure, which are critical to the determination of RNA structure. In addition, we demonstrated that the predicted 3D closeness could be used as distance restraints to guide RNA structure folding by the 3dRNA package. More accurate models could be built by using the predicted 3D closeness than the models without using 3D closeness. Availability and implementation The webserver and a standalone package are available at: http://yanglab.nankai.edu.cn/RNAcontact/. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
32. A library of coiled-coil domains: from regular bundles to peculiar twists
- Author
-
Antonio Marinho da Silva Neto, Jan Ludwiczak, Stanislaw Dunin-Horkawicz, Adriana Bukala, and Krzysztof Szczepaniak
- Subjects
Statistics and Probability ,AcademicSubjects/SCI01060 ,Computer science ,Protein domain ,010402 general chemistry ,Topology ,01 natural sciences ,Biochemistry ,03 medical and health sciences ,Transduction (genetics) ,Structural rigidity ,Molecular Biology ,030304 developmental biology ,Coiled coil ,0303 health sciences ,computer.file_format ,Protein Data Bank ,Original Papers ,Structural Bioinformatics ,0104 chemical sciences ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Bundle ,Helix ,DNA supercoil ,computer - Abstract
Motivation Coiled coils are widespread protein domains involved in diverse processes ranging from providing structural rigidity to the transduction of conformational changes. They comprise two or more α-helices that are wound around each other to form a regular supercoiled bundle. Owing to this regularity, coiled-coil structures can be described with parametric equations, thus enabling the numerical representation of their properties, such as the degree and handedness of supercoiling, rotational state of the helices, and the offset between them. These descriptors are invaluable in understanding the function of coiled coils and designing new structures of this type. The existing tools for such calculations require manual preparation of input and are therefore not suitable for the high-throughput analyses. Results To address this problem, we developed SamCC-Turbo, a software for fully automated, per-residue measurement of coiled coils. By surveying Protein Data Bank with SamCC-Turbo, we generated a comprehensive atlas of ∼50 000 coiled-coil regions. This machine learning-ready dataset features precise measurements as well as decomposes coiled-coil structures into fragments characterized by various degrees of supercoiling. The potential applications of SamCC-Turbo are exemplified by analyses in which we reveal general structural features of coiled coils involved in functions requiring conformational plasticity. Finally, we discuss further directions in the prediction and modeling of coiled coils. Availability and implementation SamCC-Turbo is available as a web server (https://lbs.cent.uw.edu.pl/samcc_turbo) and as a Python library (https://github.com/labstructbioinf/samcc_turbo), whereas the results of the Protein Data Bank scan can be browsed and downloaded at https://lbs.cent.uw.edu.pl/ccdb. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
33. Molecular Modelling of NONO and SFPQ Dimerization Process and RNA Recognition Mechanism
- Author
-
Laurenzi, T., Palazzolo, L., Taiana, E., Saporiti, S., Ben Mariem, O., Guerrini, U., Neri, A., and Eberini, I.
- Subjects
DBHS ,SFPQ ,RNA-Binding Proteins ,structural bioinformatics ,Settore FIS/07 - Fisica Applicata(Beni Culturali, Ambientali, Biol.e Medicin) ,DNA-Binding Proteins ,Protein Subunits ,Settore BIO/10 - Biochimica ,Humans ,RNA ,RNA Splicing Factors ,Multiple Myeloma ,PTB-Associated Splicing Factor ,NONO ,Dimerization - Abstract
NONO and SFPQ are involved in multiple nuclear processes (e.g., pre-mRNA splicing, DNA repair, and transcriptional regulation). These proteins, along with NEAT1, enable paraspeckle formation, thus promoting multiple myeloma cell survival. In this paper, we investigate NONO and SFPQ dimer stability, highlighting the hetero- and homodimer structural differences, and model their interactions with RNA, simulating their binding to a polyG probe mimicking NEAT1guanine-rich regions. We demonstrated in silico that NONO::SFPQ heterodimerization is a more favorable process than homodimer formation. We also show that NONO and SFPQ RRM2 subunits are primarily required for protein-protein interactions with the other DBHS protomer. Simulation of RNA binding to NONO and SFPQ, beside validating RRM1 RNP signature importance, highlighted the role of β2 and β4 strand residues for RNA specific recognition. Moreover, we demonstrated the role of the NOPS region and other protomer's RRM2 β2/β3 loop in strengthening the interaction with RNA. Our results, having deepened RNA and DBHS dimer interactions, could contribute to the design of small molecules to modulate the activity of these proteins. RNA-mimetics, able to selectively bind to NONO and/or SFPQ RNA-recognition site, could impair paraspeckle formation, thus representing a first step towards the discovery of drugs for multiple myeloma treatment.
- Published
- 2022
34. ClustENMD: efficient sampling of biomolecular conformational space at atomic resolution
- Author
-
Pemra Doruker, Burak Tevfik Kaynak, She Zhang, and Ivet Bahar
- Subjects
0301 basic medicine ,Statistics and Probability ,AcademicSubjects/SCI01060 ,Computer science ,01 natural sciences ,Biochemistry ,Computational science ,03 medical and health sciences ,Molecular dynamics ,0103 physical sciences ,MIT License ,Cluster analysis ,Molecular Biology ,010304 chemical physics ,business.industry ,Sampling (statistics) ,Applications Notes ,Structural Bioinformatics ,Automation ,Computer Science Applications ,Visualization ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Table (database) ,business ,Subspace topology - Abstract
Summary Efficient sampling of conformational space is essential for elucidating functional/allosteric mechanisms of proteins and generating ensembles of conformers for docking applications. However, unbiased sampling is still a challenge especially for highly flexible and/or large systems. To address this challenge, we describe a new implementation of our computationally efficient algorithm ClustENMD that is integrated with ProDy and OpenMM softwares. This hybrid method performs iterative cycles of conformer generation using elastic network model for deformations along global modes, followed by clustering and short molecular dynamics simulations. ProDy framework enables full automation and analysis of generated conformers and visualization of their distributions in the essential subspace. Availability and implementation ClustENMD is open-source and freely available under MIT License from https://github.com/prody/ProDy. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
35. PDBe aggregated API: programmatic access to an integrative knowledge graph of molecular structure data
- Author
-
David R. Armstrong, Nurul Nadzirin, Lukáš Pravda, Mihaly Varadi, Sameer Velankar, Stephen Anyango, Saqib Mir, John M. Berrisford, Sreenath Nair, and Aleksandras Gutmanas
- Subjects
Statistics and Probability ,Supplementary data ,Information retrieval ,Source code ,Graph database ,AcademicSubjects/SCI01060 ,Computer science ,media_common.quotation_subject ,computer.file_format ,Python (programming language) ,Protein Data Bank ,computer.software_genre ,Applications Notes ,Structural Bioinformatics ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Knowledge graph ,UniProt ,Molecular Biology ,computer ,media_common ,computer.programming_language - Abstract
Summary The PDBe aggregated API is an open-access and open-source RESTful API that provides programmatic access to a wealth of macromolecular structural data and their functional and biophysical annotations through 80+ API endpoints. The API is powered by the PDBe graph database (https://pdbe.org/graph-schema), an open-access integrative knowledge graph that can be used as a discovery tool to answer complex biological questions. Availability and implementation The PDBe aggregated API provides up-to-date access to the PDBe graph database, which has weekly releases with the latest data from the Protein Data Bank, integrated with updated annotations from UniProt, Pfam, CATH, SCOP and the PDBe-KB partner resources. The complete list of all the available API endpoints and their descriptions are available at https://pdbe.org/graph-api. The source code of the Python 3.6+ API application is publicly available at https://gitlab.ebi.ac.uk/pdbe-kb/services/pdbe-graph-api. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
36. Discovery of new promising USP14 inhibitors: computational evaluation of the thumb-palm pocket
- Author
-
Ikponwmosa Obaseki, Oluwaseun Fapohunda, Niyi S. Adelakun, Eseiwi Obaseki, Olaposi I Omotuyi, and Ayobami Adeniyi
- Subjects
Cytoplasm ,Proteasome Endopeptidase Complex ,medicine.medical_treatment ,030303 biophysics ,Allosteric regulation ,Computational biology ,Molecular Dynamics Simulation ,Molecular mechanics ,Deubiquitinating enzyme ,03 medical and health sciences ,Structural bioinformatics ,Structural Biology ,medicine ,biochemistry ,Molecular Biology ,ADME ,0303 health sciences ,Protease ,biology ,Ubiquitin ,Drug discovery ,Chemistry ,General Medicine ,Molecular Docking Simulation ,Thumb ,biology.protein ,PubChem - Abstract
Ubiquitin-specific protease 14 (USP14) is a member of the deubiquitinating enzymes (DUBs) involved in disrupting the ubiquitin-proteasome regulation system, responsible for the degradation of impaired and misfolded proteins, which is an essential mechanism in eukaryotic cells. The involvement of USP14 in cancer progression and neurodegenerative disorders has been reported. Thereof USP14 is a prime therapeutic target; hence, designing efficacious inhibitors against USP14 is central in curbing these conditions. Herein, we relied on structural bioinformatics methods incorporating molecular docking, molecular mechanics generalized born surface area (MM-GBSA), molecular dynamics simulation (MD simulation), and ADME to identify potential allosteric USP14 inhibitors. A library of over 733 compounds from the PubChem repository with >90% match to the IU1 chemical structure was screened in a multi-step framework to attain prospective drug-like inhibitors. Two potential lead compounds (CID 43013232 and CID 112370349) were shown to record better binding affinity compared to IU1, but with subtle difference to IU1-47, a 10-fold potent compound when compared to IU1. The stability of the lead molecules complexed with USP14 was studied via MD simulation. The molecules were found to be stable within the binding site throughout the 50 ns simulation time. Moreover, the protein���ligand interactions across the simulation run time suggest Phe331, Tyr476, and Gln197 as crucial residues for USP14 inhibition. Furthermore, in-silico pharmacological evaluation revealed the lead compounds as pharmacological sound molecules. Overall, the methods deployed in this study revealed two novel candidates that may show selective inhibitory activity against USP14, which could be exploited to produce potent and harmless USP14 inhibitors. Communicated by Ramaswamy H. Sarma
- Published
- 2020
37. Single-particle cryo-EM at atomic resolution
- Author
-
Peter Tiemeijer, Abhay Kotecha, Erwin de Jong, Takanori Nakane, Dimitri Y. Chirgadze, D. Karia, Steven W. Hardwick, Andrija Sente, Maarten Bischoff, Lina Malinauskaite, Jamie McCormack, Garib N. Murshudov, S. Masiulis, A. Radu Aricescu, Patricia M. G. E. Brown, Tomasz Uchański, Jonas Miehling, Sjors H.W. Scheres, Lingbo Yu, Ioana T. Grigoras, Jeroen Keizer, Evgeniya V. Pechnikova, Greg McMullan, and Tomas Malinauskas
- Subjects
0303 health sciences ,Multidisciplinary ,Materials science ,Cryo-electron microscopy ,Drug discovery ,Resolution (electron density) ,Small molecule ,Article ,03 medical and health sciences ,Structural bioinformatics ,0302 clinical medicine ,Protein structure ,Chemical physics ,Particle ,Molecule ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
The three-dimensional positions of atoms in protein molecules define their structure and the roles they perform in biological processes. The more precisely atomic coordinates are determined, the more chemical information can be derived and the more mechanistic insights into protein function may be inferred. With breakthroughs in electron detection and image processing technology, electron cryo-microscopy (cryo-EM) single-particle analysis has yielded protein structures with increasing levels of detail in recent years(1,2). However, obtaining cryo-EM reconstructions with sufficient resolution to visualise individual atoms in proteins has thus far been elusive. Here, we show that using a new electron source, energy filter and camera, we obtained a 1.7 Å resolution cryo-EM reconstruction for a human membrane protein, the β3 GABA(A) receptor homopentamer(3). Such maps allow a detailed understanding of small molecule coordination, visualisation of solvent molecules and alternative conformations for multiple amino acids, as well as unambiguous building of ordered acidic side chains and glycans. Applied to mouse apoferritin, our strategy led to a 1.22 Å resolution reconstruction that, for the first time, offers a genuine atomic resolution view of a protein molecule using single particle cryo-EM. Moreover, the scattering potential from many hydrogen atoms can be visualised in difference maps, allowing a direct analysis of hydrogen bonding networks. The technological advances described here, combined with further approaches to accelerate data acquisition and improve sample quality, provide a route towards routine application of cryo-EM in high-throughput screening of small molecule modulators and structure-based drug discovery.
- Published
- 2020
38. QChromosomeVisualizer: A new tool for 3D visualization of long simulations of polymer-like chromosome models
- Author
-
Bartek Wilczyński, Bartłomiej Zawalski, and Irina Tuszynska
- Subjects
Polymers ,Computer science ,Molecular Conformation ,Chromosomes ,General Biochemistry, Genetics and Molecular Biology ,Field (computer science) ,Chromatin Assembly ,03 medical and health sciences ,Structural bioinformatics ,Imaging, Three-Dimensional ,Software ,Human–computer interaction ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,business.industry ,Data Visualization ,030302 biochemistry & molecular biology ,Virtual Reality ,Computational Biology ,3D modeling ,Chromatin ,Data availability ,Visualization ,Microscopic imaging ,business - Abstract
Recent years have brought us great wealth of new types of experimental data on different aspects of chromatin state, from chromosome conformation assays, through super-resolution microscopic imaging to epigenetic modifications and lamina interaction assays. This rapid increase in data availability have motivated many novel approaches to 3D modeling of chromosomes, their conformations and dynamic behavior. Even though there are many tools already developed for molecular visualization in the field of structural bioinformatics, they are usually optimized for visualization of smaller molecules (like proteins) and much shorter trajectories. We have developed a novel approach to visualization of long trajectories of large polymers, typical in the field of chromatin modeling. Our software, called QChromosomeVisualizer (QCV), allows for quick visualization of long simulations containing thousands or even millions of frames and generating good looking still images and animations including spherical 360 videos that can be viewed in VR headsets. We believe that this kind of tools will be helpful for the broader community of researchers interested in modeling by allowing them to create new and clearer ways to communicate their results.
- Published
- 2020
39. Study of model problem of structural bioinformatics
- Author
-
Andrey Chepurnov and Nikolay Ershov
- Subjects
Structural bioinformatics ,Computer science ,Computational biology - Abstract
The paper is devoted to the study of methods for solving problems of structural bioinformatics on the example of solving a model problem of graphs layout on a plane. The paper considers an "energy" approach to solving this type of problems, based on the use of continuous optimization methods, the purpose of which is to find a configuration with a minimum energy. The paper formulates a model problem of graph layout, describes the structure of graphs to be processed, and defines an objective function that simulates the internal energy of graph layout. Several popular optimization methods are described, including a genetic algorithm and a differential evolution algorithm. Parallel variations of these two algorithms are considered. Implementation of a software system for automatic testing of a user-defined algorithm for solving model folding problems with support for parallel computing, web interface and visualization of computations is described. The work was carried out with the financial support of the Russian Foundation for Basic Research (Grant No. 20-07-01053 A).
- Published
- 2020
40. CSynth: an interactive modelling and visualization tool for 3D chromatin structure
- Author
-
Peter Todd, Stephen Todd, William Latham, Stephen Taylor, Frederic Fol Leymarie, Yasutaka Kakui, Jim R. Hughes, and Simon J. McGowan
- Subjects
Statistics and Probability ,AcademicSubjects/SCI01060 ,Computer science ,Cell ,Molecular Conformation ,Genome browser ,Biochemistry ,Genome ,Chromosomes ,Chromosome conformation capture ,03 medical and health sciences ,0302 clinical medicine ,Software ,Human–computer interaction ,Component (UML) ,Gene expression ,medicine ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,business.industry ,MODELLER ,Original Papers ,Structural Bioinformatics ,Chromatin ,Computer Science Applications ,Visualization ,Computational Mathematics ,medicine.anatomical_structure ,Computational Theory and Mathematics ,business ,030217 neurology & neurosurgery - Abstract
Motivation The 3D structure of chromatin in the nucleus is important for gene expression and regulation. Chromosome conformation capture techniques, such as Hi-C, generate large amounts of data showing interaction points on the genome but these are hard to interpret using standard tools. Results We have developed CSynth, an interactive 3D genome browser and real-time chromatin restraint-based modeller to visualize models of any chromosome conformation capture (3C) data. Unlike other modelling systems, CSynth allows dynamic interaction with the modelling parameters to allow experimentation and effects on the model. It also allows comparison of models generated from data in different tissues/cell states and the results of third-party 3D modelling outputs. In addition, we include an option to view and manipulate these complicated structures using Virtual Reality (VR) so scientists can immerse themselves in the models for further understanding. This VR component has also proven to be a valuable teaching and a public engagement tool. Availabilityand implementation CSynth is web based and available to use at csynth.org. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
41. CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles
- Author
-
Giorgio E. Tamò, Matteo Dal Peraro, Giulia Fonti, Deniz Aydin, Sylvain Träger, and Martina Audagnotto
- Subjects
Statistics and Probability ,Theoretical computer science ,AcademicSubjects/SCI01060 ,Computer science ,01 natural sciences ,Biochemistry ,03 medical and health sciences ,Clone (algebra) ,0103 physical sciences ,Cluster (physics) ,Cluster Analysis ,Probabilistic analysis of algorithms ,Cluster analysis ,Molecular Biology ,030304 developmental biology ,Flexibility (engineering) ,0303 health sciences ,010304 chemical physics ,Proteins ,Original Papers ,Structural Bioinformatics ,Clone Cells ,Computer Science Applications ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Key (cryptography) ,Algorithms ,Software - Abstract
Motivation Proteins are intrinsically dynamic entities. Flexibility sampling methods, such as molecular dynamics or those arising from integrative modeling strategies, are now commonplace and enable the study of molecular conformational landscapes in many contexts. Resulting structural ensembles increase in size as technological and algorithmic advancements take place, making their analysis increasingly demanding. In this regard, cluster analysis remains a go-to approach for their classification. However, many state-of-the-art algorithms are restricted to specific cluster properties. Combined with tedious parameter fine-tuning, cluster analysis of protein structural ensembles suffers from the lack of a generally applicable and easy to use clustering scheme. Results We present CLoNe, an original Python-based clustering scheme that builds on the Density Peaks algorithm of Rodriguez and Laio. CLoNe relies on a probabilistic analysis of local density distributions derived from nearest neighbors to find relevant clusters regardless of cluster shape, size, distribution and amount. We show its capabilities on many toy datasets with properties otherwise dividing state-of-the-art approaches and improves on the original algorithm in key aspects. Applied to structural ensembles, CLoNe was able to extract meaningful conformations from membrane binding events and ligand-binding pocket opening as well as identify dominant dimerization motifs or inter-domain organization. CLoNe additionally saves clusters as individual trajectories for further analysis and provides scripts for automated use with molecular visualization software. Availability and implementation www.epfl.ch/labs/lbm/resources, github.com/LBM-EPFL/CLoNe. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
42. GraphQA: protein model quality assessment using graph convolutional networks
- Author
-
Hossein Azizpour, Arne Elofsson, David Menéndez Hurtado, and Federico Baldassarre
- Subjects
Statistics and Probability ,Protein Folding ,AcademicSubjects/SCI01060 ,Computer science ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,0302 clinical medicine ,Molecule ,Molecular Biology ,030304 developmental biology ,Supplementary data ,0303 health sciences ,Quality assessment ,Proteins ,A protein ,Original Papers ,Structural Bioinformatics ,Graph ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Protein model ,Graph (abstract data type) ,Protein folding ,Neural Networks, Computer ,Data mining ,computer ,Feature learning ,030217 neurology & neurosurgery - Abstract
Motivation Proteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein’s structure can be time-consuming, prohibitively expensive and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results. GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance and computational efficiency. Results GraphQA performs similarly to state-of-the-art methods despite using a relatively low number of input features. In addition, the graph network structure provides an improvement over the architecture used in ProQ4 operating on the same input features. Finally, the individual contributions of GraphQA components are carefully evaluated. Availability and implementation PyTorch implementation, datasets, experiments and link to an evaluation server are available through this GitHub repository: github.com/baldassarreFe/graphqa. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
43. A semi-supervised learning framework for quantitative structure–activity regression modelling
- Author
-
Isidro Cortes-Ciriano, James A Watson, and Oliver P. Watson
- Subjects
Statistics and Probability ,Quantitative structure–activity relationship ,AcademicSubjects/SCI01060 ,Computer science ,media_common.quotation_subject ,Plasmodium falciparum ,Quantitative Structure-Activity Relationship ,Semi-supervised learning ,Machine learning ,computer.software_genre ,01 natural sciences ,Biochemistry ,Set (abstract data type) ,Antimalarials ,03 medical and health sciences ,Drug Discovery ,Similarity (psychology) ,Representation (mathematics) ,Molecular Biology ,030304 developmental biology ,media_common ,Selection bias ,0303 health sciences ,Training set ,business.industry ,Drug discovery ,Supervised learning ,Regression analysis ,Original Papers ,Structural Bioinformatics ,Small molecule ,0104 chemical sciences ,Computer Science Applications ,010404 medicinal & biomolecular chemistry ,Computational Mathematics ,Computational Theory and Mathematics ,Supervised Machine Learning ,Artificial intelligence ,business ,computer - Abstract
Motivation Quantitative structure–activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their corresponding target-specific activities. These supervised learning models can then be used to predict the activity of previously unmeasured novel compounds. Results This work provides methods that solve three problems in QSAR modelling: (i) a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and (iii) a method to adjust for screening dependent selection bias inherent in many training datasets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening threshold are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions, which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure–activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro Plasmodium falciparum growth. Availabilityand implementation https://github.com/owatson/PenalizedPrediction. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
44. Identification of population-level differentially expressed genes in one-phenotype data
- Author
-
Zheng Guo, Jiajing Xie, Jie Xia, Haidan Yan, Qingzhou Guan, Hui Liu, Yang Xu, Meifeng Li, Haifeng Chen, Jun He, and Meirong Chi
- Subjects
Statistics and Probability ,AcademicSubjects/SCI01060 ,Population level ,Computational biology ,Biology ,Biochemistry ,03 medical and health sciences ,0302 clinical medicine ,Differential expression ,Molecular Biology ,030304 developmental biology ,Supplementary data ,0303 health sciences ,Gold standard (test) ,Original Papers ,Structural Bioinformatics ,Phenotype ,Computer Science Applications ,Computational Mathematics ,Identification (information) ,Differentially expressed genes ,Computational Theory and Mathematics ,Significance analysis of microarrays ,Algorithms ,030217 neurology & neurosurgery - Abstract
Motivation For some specific tissues, such as the heart and brain, normal controls are difficult to obtain. Thus, studies with only a particular type of disease samples (one phenotype) cannot be analyzed using common methods, such as significance analysis of microarrays, edgeR and limma. The RankComp algorithm, which was mainly developed to identify individual-level differentially expressed genes (DEGs), can be applied to identify population-level DEGs for the one-phenotype data but cannot identify the dysregulation directions of DEGs. Results Here, we optimized the RankComp algorithm, termed PhenoComp. Compared with RankComp, PhenoComp provided the dysregulation directions of DEGs and had more robust detection power in both simulated and real one-phenotype data. Moreover, using the DEGs detected by common methods as the ‘gold standard’, the results showed that the DEGs detected by PhenoComp using only one-phenotype data were comparable to those identified by common methods using case-control samples, independent of the measurement platform. PhenoComp also exhibited good performance for weakly differential expression signal data. Availability and implementation The PhenoComp algorithm is available on the web at https://github.com/XJJ-student/PhenoComp. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
45. Structural Consequences of Disease‐Related Mutations for Protein–Protein Interactions
- Author
-
Mireia Rosell and Juan Fernández-Recio
- Subjects
Genetics ,Structural bioinformatics ,Disease ,Biology ,Protein–protein interaction - Published
- 2020
46. Protein Structure Prediction Using Robust Principal Component Analysis and Support Vector Machine
- Author
-
Zuraini Ali Shah, Shahreen Kasim, and Nur Aini Zakaria
- Subjects
business.industry ,Computer science ,Feature extraction ,Pattern recognition ,Protein structure prediction ,Protein tertiary structure ,Support vector machine ,Structural bioinformatics ,Principal component analysis ,Radial basis function kernel ,General Earth and Planetary Sciences ,Artificial intelligence ,business ,Robust principal component analysis ,General Environmental Science - Abstract
Existence of bioinformatics is to increase the further understanding of biological process. Proteins structure is one of the major challenges in structural bioinformatics. With former knowledge of the structure, the quality of secondary structure, prediction of tertiary structure, and prediction function of amino acid from its sequence increase significantly. Recently, the gap between sequence known and structure known proteins had increase dramatically. So it is compulsory to understand on proteins structure to overcome this problem so further functional analysis could be easier. The research applying RPCA algorithm to extract the essential features from the original high-dimensional input vectors. Then the process followed by experimenting SVM with RBF kernel. The proposed method obtains accuracy by 84.41% for training dataset and 89.09% for testing dataset. The result then compared with the same method but PCA was applied as the feature extraction. The prediction assessment is conducted by analyzing the accuracy and number of principal component selected. It shows that combination of RPCA and SVM produce a high quality classification of protein structure
- Published
- 2020
47. Structural Bioinformatics Predicts Large Intrinsically Disordered Regions of Erythrocyte Binding-Like Proteins of Plasmodium sp.: Functional Implications
- Author
-
Arita Acharjee
- Subjects
Structural bioinformatics ,Biochemistry ,Chemistry ,Erythrocyte binding ,Plasmodium sp - Published
- 2020
48. Human papillomavirus E6: Host cell receptor, GRP78, binding site prediction
- Author
-
Abdo A. Elfiky
- Subjects
GRP78 ,BiP ,Cellular differentiation ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Downregulation and upregulation ,protein‐protein docking ,Virology ,Heat shock protein ,medicine ,030212 general & internal medicine ,Binding site ,Receptor ,Gene ,Research Articles ,HPV infection ,structural bioinformatics ,medicine.disease ,HPV E6 ,Infectious Diseases ,Unfolded protein response ,Cancer research ,030211 gastroenterology & hepatology ,Research Article - Abstract
Human papillomavirus (HPV) is the main cervical cancer‐promoting element that is transmitted through sexual routes. Anal, head, and throat cancers are also reported to be accompanied by HPV infection. E6 is one of the HPV nonstructural proteins, which is responsible for cell differentiation by targeting tumor suppressor genes, p105Rb and p53. E6 was reported to be stabilized by two chaperone proteins; glucose‐regulated protein 78 (GRP78) and heat shock protein 90. GRP78 is responsible for the unfolded protein response of the cells, and it was reported to be upregulated in many cancers, including cervical cancer. It was reported that knocking out GRP78 destabilizes E6 leading to faster degradation of E6 in vivo. The current work predicts the possible binding mode between E6 and GRP78 based on sequence and structural similarities., Research Highlights Understanding the binding behavior of glucose‐regulated protein 78 (GRP78) can pave the way for drugs that can prevent this interaction between the host cell proteins and the viral protein E6 that is crucial for cancer propagation.Once GRP78 binding is inhibited, E6 will be destabilized and prone to a faster degradation rate by the host cell degradation machinery.
- Published
- 2020
49. Rhapsody: predicting the pathogenicity of human missense variants
- Author
-
Daniel A Peñaherrera, Luca Ponzoni, Ivet Bahar, and Zoltán N. Oltvai
- Subjects
Statistics and Probability ,Computer science ,Interface (Java) ,In silico ,Documentation ,Computational biology ,Biochemistry ,03 medical and health sciences ,0302 clinical medicine ,Mutant protein ,Humans ,Missense mutation ,Computer Simulation ,Saturated mutagenesis ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Virulence ,Protein dynamics ,Computational Biology ,Molecular diagnostics ,Pathogenicity ,Original Papers ,Structural Bioinformatics ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Software ,030217 neurology & neurosurgery - Abstract
Motivation The biological effects of human missense variants have been studied experimentally for decades but predicting their effects in clinical molecular diagnostics remains challenging. Available computational tools are usually based on the analysis of sequence conservation and structural properties of the mutant protein. We recently introduced a new machine learning method that demonstrated for the first time the significance of protein dynamics in determining the pathogenicity of missense variants. Results Here, we present a new interface (Rhapsody) that enables fully automated assessment of pathogenicity, incorporating both sequence coevolution data and structure- and dynamics-based features. Benchmarked against a dataset of about 20 000 annotated variants, the methodology is shown to outperform well-established and/or advanced prediction tools. We illustrate the utility of Rhapsody by in silico saturation mutagenesis studies of human H-Ras, phosphatase and tensin homolog and thiopurine S-methyltransferase. Availability and implementation The new tool is available both as an online webserver at http://rhapsody.csb.pitt.edu and as an open-source Python package (GitHub repository: https://github.com/prody/rhapsody; PyPI package installation: pip install prody-rhapsody). Links to additional resources, tutorials and package documentation are provided in the 'Python package' section of the website. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
50. atomium—a Python structure parser
- Author
-
Andrew J. Martin and Sam M. Ireland
- Subjects
Statistics and Probability ,Computer science ,Protein Data Bank (RCSB PDB) ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,Software ,Databases, Protein ,Molecular Biology ,030304 developmental biology ,computer.programming_language ,0303 health sciences ,Parsing ,Molecular Structure ,Programming language ,business.industry ,030302 biochemistry & molecular biology ,computer.file_format ,Python (programming language) ,Protein Data Bank ,File format ,Original Papers ,Structural Bioinformatics ,JSON ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Structural biology ,business ,computer ,Macromolecule - Abstract
Summary Structural biology relies on specific file formats to convey information about macromolecular structures. Traditionally this has been the PDB format, but increasingly newer formats, such as PDBML, mmCIF and MMTF are being used. Here we present atomium, a modern, lightweight, Python library for parsing, manipulating and saving PDB, mmCIF and MMTF file formats. In addition, we provide a web service, pdb2json, which uses atomium to give a consistent JSON representation to the entire Protein Data Bank. Availability and implementation atomium is implemented in Python and its performance is equivalent to the existing library BioPython. However, it has significant advantages in features and API design. atomium is available from atomium.bioinf.org.uk and pdb2json can be accessed at pdb2json.bioinf.org.uk Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.