21 results on '"Flachsenberg F"'
Search Results
2. SpaceGrow: efficient shape-based virtual screening of billion-sized combinatorial fragment spaces.
- Author
-
Hönig SMN, Flachsenberg F, Ehrt C, Neumann A, Schmidt R, Lemmen C, and Rarey M
- Subjects
- Ligands, Small Molecule Libraries chemistry, Drug Discovery methods
- Abstract
The growing size of make-on-demand chemical libraries is posing new challenges to cheminformatics. These ultra-large chemical libraries became too large for exhaustive enumeration. Using a combinatorial approach instead, the resource requirement scales approximately with the number of synthons instead of the number of molecules. This gives access to billions or trillions of compounds as so-called chemical spaces with moderate hardware and in a reasonable time frame. While extremely performant ligand-based 2D methods exist in this context, 3D methods still largely rely on exhaustive enumeration and therefore fail to apply. Here, we present SpaceGrow: a novel shape-based 3D approach for ligand-based virtual screening of billions of compounds within hours on a single CPU. Compared to a conventional superposition tool, SpaceGrow shows comparable pose reproduction capacity based on RMSD and superior ranking performance while being orders of magnitude faster. Result assessment of two differently sized subsets of the eXplore space reveals a higher probability of finding superior results in larger spaces highlighting the potential of searching in ultra-large spaces. Furthermore, the application of SpaceGrow in a drug discovery workflow was investigated in four examples involving G protein-coupled receptors (GPCRs) with the aim to identify compounds with similar binding capabilities and molecular novelty., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
3. Redocking the PDB.
- Author
-
Flachsenberg F, Ehrt C, Gutermuth T, and Rarey M
- Subjects
- Molecular Docking Simulation, Binding Sites, Ligands, Protein Binding, Drug Design
- Abstract
Molecular docking is a standard technique in structure-based drug design (SBDD). It aims to predict the 3D structure of a small molecule in the binding site of a receptor (often a protein). Despite being a common technique, it often necessitates multiple tools and involves manual steps. Here, we present the JAMDA preprocessing and docking workflow that is easy to use and allows fully automated docking. We evaluate the JAMDA docking workflow on binding sites extracted from the complete PDB and derive key factors determining JAMDA's docking performance. With that, we try to remove most of the bias due to manual intervention and provide a realistic estimate of the redocking performance of our JAMDA preprocessing and docking workflow for any PDB structure. On this large PDBScan22 data set, our JAMDA workflow finds a pose with an RMSD of at most 2 Å to the crystal ligand on the top rank for 30.1% of the structures. When applying objective structure quality filters to the PDBScan22 data set, the success rate increases to 61.8%. Given the prepared structures from the JAMDA preprocessing pipeline, both JAMDA and the widely used AutoDock Vina perform comparably on this filtered data set (the PDBScan22-HQ data set).
- Published
- 2024
- Full Text
- View/download PDF
4. LifeSoaks: a tool for analyzing solvent channels in protein crystals and obstacles for soaking experiments.
- Author
-
Pletzer-Zelgert J, Ehrt C, Fender I, Griewel A, Flachsenberg F, Klebe G, and Rarey M
- Subjects
- Solvents, Proteins chemistry
- Abstract
Due to the structural complexity of proteins, their corresponding crystal arrangements generally contain a significant amount of solvent-occupied space. These areas allow a certain degree of intracrystalline protein flexibility and mobility of solutes. Therefore, knowledge of the geometry of solvent-filled channels and cavities is essential whenever the dynamics inside a crystal are of interest. Especially in soaking experiments for structure-based drug design, ligands must be able to traverse the crystal solvent channels and reach the corresponding binding pockets. Unsuccessful screenings are sometimes attributed to the geometry of the crystal packing, but the underlying causes are often difficult to understand. This work presents LifeSoaks, a novel tool for analyzing and visualizing solvent channels in protein crystals. LifeSoaks uses a Voronoi diagram-based periodic channel representation which can be efficiently computed. The size and location of channel bottlenecks, which might hinder molecular diffusion, can be directly derived from this representation. This work presents the calculated bottleneck radii for all crystal structures in the PDB and the analysis of a new, hand-curated data set of structures obtained by soaking experiments. The results indicate that the consideration of bottleneck radii and the visual inspection of channels are beneficial for planning soaking experiments., (open access.)
- Published
- 2023
- Full Text
- View/download PDF
5. FastGrow: on-the-fly growing and its application to DYRK1A.
- Author
-
Penner P, Martiny V, Bellmann L, Flachsenberg F, Gastreich M, Theret I, Meyer C, and Rarey M
- Subjects
- Algorithms, Ligands, Drug Design, Software
- Abstract
Fragment-based drug design is an established routine approach in both experimental and computational spheres. Growing fragment hits into viable ligands has increasingly shifted into the spotlight. FastGrow is an application based on a shape search algorithm that addresses this challenge at high speeds of a few milliseconds per fragment. It further features a pharmacophoric interaction description, ensemble flexibility, as well as geometry optimization to become a fully fledged structure-based modeling tool. All features were evaluated in detail on a previously reported collection of fragment growing scenarios extracted from crystallographic data. FastGrow was also shown to perform competitively versus established docking software. A case study on the DYRK1A kinase, using recently reported new chemotypes, illustrates FastGrow's features in practice and its ability to identify active fragments. FastGrow is freely available to the public as a web server at https://fastgrow.plus/ and is part of the SeeSAR 3D software package., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
6. ProteinsPlus: a comprehensive collection of web-based molecular modeling tools.
- Author
-
Schöning-Stierand K, Diedrich K, Ehrt C, Flachsenberg F, Graef J, Sieg J, Penner P, Poppinga M, Ungethüm A, and Rarey M
- Subjects
- Molecular Docking Simulation, Models, Molecular, Internet, Software, Proteins chemistry
- Abstract
Upon the ever-increasing number of publicly available experimentally determined and predicted protein and nucleic acid structures, the demand for easy-to-use tools to investigate these structural models is higher than ever before. The ProteinsPlus web server (https://proteins.plus) comprises a growing collection of molecular modeling tools focusing on protein-ligand interactions. It enables quick access to structural investigations ranging from structure analytics and search methods to molecular docking. It is by now well-established in the community and constantly extended. The server gives easy access not only to experts but also to students and occasional users from the field of life sciences. Here, we describe its recently added new features and tools, beyond them a novel method for on-the-fly molecular docking and a search method for single-residue substitutions in local regions of a protein structure throughout the whole Protein Data Bank. Finally, we provide a glimpse into new avenues for the annotation of AlphaFold structures which are directly accessible via a RESTful service on the ProteinsPlus web server., (© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2022
- Full Text
- View/download PDF
7. LSLOpt: An open-source implementation of the step-length controlled LSL-BFGS algorithm.
- Author
-
Flachsenberg F and Rarey M
- Abstract
Numerical optimization is a common technique in various areas of computational chemistry, molecular modeling and drug design. It is a key element of 3D techniques, for example, the optimization of protein-ligand poses and small-molecule conformers. Here, often the BFGS algorithm or variants thereof are used. However, the BFGS algorithm tends to make unreasonable large changes to the optimized system under certain circumstances. This behavior has been known for a long time and different solutions have been suggested. Recently, we have analyzed the optimization behavior of our novel JAMDA scoring function in detail and proposed the limited step length (LSL)-BFGS algorithm as a new solution to the problem of excessively large steps during optimization. The LSL-BFGS algorithm allows to control the step sizes during optimization. Its unique feature is the inclusion of arbitrary domain knowledge into the selection of the step sizes. Here, we introduce the open-source LSLOpt C++ library that implements this LSL-BFGS algorithm and demonstrate its usage., (© 2021 The Authors. Journal of Computational Chemistry published by Wiley Periodicals LLC.)
- Published
- 2021
- Full Text
- View/download PDF
8. A Consistent Scheme for Gradient-Based Optimization of Protein - Ligand Poses.
- Author
-
Flachsenberg F, Meyder A, Sommer K, Penner P, and Rarey M
- Subjects
- Benchmarking, Ligands, Molecular Docking Simulation, Protein Binding, Algorithms, Proteins metabolism
- Abstract
Scoring and numerical optimization of protein-ligand poses is an integral part of docking tools. Although many scoring functions exist, many of them are not continuously differentiable and they are rarely explicitly analyzed with respect to their numerical optimization behavior. Here, we present a consistent scheme for pose scoring and gradient-based pose optimization. It consists of a novel variant of the BFGS algorithm enabling step-length control, named LSL-BFGS (limited step length BFGS), and the empirical JAMDA scoring function designed for pose prediction and good numerical optimizability. The JAMDA scoring function shows a high pose prediction performance in the CASF-2016 docking power benchmark, top-ranking a pose with an RMSD of ≤2 Å in about 89% of the cases. The combination of JAMDA scoring with the LSL-BFGS algorithm shows a significantly higher optimization locality (i.e., no excessive movement of poses) than with the classical BFGS algorithm while retaining the characteristically low number of scoring function evaluations. The JAMDA scoring and optimization scheme is freely available for noncommercial use and academic research.
- Published
- 2020
- Full Text
- View/download PDF
9. ProteinsPlus: interactive analysis of protein-ligand binding interfaces.
- Author
-
Schöning-Stierand K, Diedrich K, Fährrolfes R, Flachsenberg F, Meyder A, Nittinger E, Steinegger R, and Rarey M
- Subjects
- Binding Sites, Ligands, Metals chemistry, Models, Molecular, Proteins metabolism, Water chemistry, Proteins chemistry, Software
- Abstract
Due to the increasing amount of publicly available protein structures searching, enriching and investigating these data still poses a challenging task. The ProteinsPlus web service (https://proteins.plus) offers a broad range of tools addressing these challenges. The web interface to the tool collection focusing on protein-ligand interactions has been geared towards easy and intuitive access to a large variety of functionality for life scientists. Since our last publication, the ProteinsPlus web service has been extended by additional services as well as it has undergone substantial infrastructural improvements. A keyword search functionality was added on the start page of ProteinsPlus enabling users to work on structures without knowing their PDB code. The tool collection has been augmented by three tools: StructureProfiler validates ligands and active sites using selection criteria of well-established protein-ligand benchmark data sets, WarPP places water molecules in the ligand binding sites of a protein, and METALizer calculates, predicts and scores coordination geometries of metal ions based on surrounding complex atoms. Additionally, all tools provided by ProteinsPlus are available through a REST service enabling the automated integration in structure processing and modeling pipelines., (© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2020
- Full Text
- View/download PDF
10. Structural and biophysical characterization of the type VII collagen vWFA2 subdomain leads to identification of two binding sites.
- Author
-
Gebauer JM, Flachsenberg F, Windler C, Richer B, Baumann U, and Seeger K
- Subjects
- Amino Acid Motifs, Amino Acid Sequence, Animals, Autoantibodies immunology, Binding Sites, Blister immunology, Blister metabolism, Cell Adhesion, Collagen Type I metabolism, Epidermolysis Bullosa Acquisita immunology, Epidermolysis Bullosa Acquisita metabolism, Extracellular Matrix metabolism, HaCaT Cells, Humans, Integrin beta1 chemistry, Integrin beta1 metabolism, Laminin metabolism, Mice, Protein Binding, Skin metabolism, von Willebrand Factor immunology, Collagen Type VII chemistry, Collagen Type VII metabolism, Protein Domains immunology, von Willebrand Factor chemistry, von Willebrand Factor metabolism
- Abstract
Type VII collagen is an extracellular matrix protein, which is important for skin stability; however, detailed information at the molecular level is scarce. The second vWFA (von Willebrand factor type A) domain of type VII collagen mediates important interactions, and immunization of mice induces skin blistering in certain strains. To understand vWFA2 function and the pathophysiological mechanisms leading to skin blistering, we structurally characterized this domain by X-ray crystallography and NMR spectroscopy. Cell adhesion assays identified two new interactions: one with β1 integrin via its RGD motif and one with laminin-332. The latter interaction was confirmed by surface plasmon resonance with a K
D of about 1 mm. These data show that vWFA2 has additional functions in the extracellular matrix besides interacting with type I collagen., (© 2020 The Authors. Published by FEBS Press and John Wiley & Sons Ltd.)- Published
- 2020
- Full Text
- View/download PDF
11. In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.
- Author
-
Sieg J, Flachsenberg F, and Rarey M
- Subjects
- Benchmarking methods, Databases, Factual, Ligands, Molecular Docking Simulation methods, Molecular Structure, Retrospective Studies, Structure-Activity Relationship, Bias, Machine Learning
- Abstract
Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, we re-evaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, we demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from standard benchmarks. On the basis of these results, we conclude that there is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, we provide guidelines for setting up validation experiments and give a perspective on how new data sets could be generated.
- Published
- 2019
- Full Text
- View/download PDF
12. StructureProfiler: an all-in-one tool for 3D protein structure profiling.
- Author
-
Meyder A, Kampen S, Sieg J, Fährrolfes R, Friedrich NO, Flachsenberg F, and Rarey M
- Subjects
- Computational Biology, Drug Design, Proteins, Software
- Abstract
Motivation: Three-dimensional protein structures are important starting points for elucidating protein function and applications like drug design. Computational methods in this area rely on high quality validation datasets which are usually manually assembled. Due to the increase in published structures as well as the increasing demand for specially tailored validation datasets, automatic procedures should be adopted., Results: StructureProfiler is a new tool for automatic, objective and customizable profiling of X-ray protein structures based on the most frequently applied selection criteria currently in use to assemble benchmark datasets. As examples, four dataset configurations (Astex, Iridium, Platinum, combined), all results of the combined tests and the list of all PDB Ids passing the combined criteria set are attached in the Supplementary Material., Availability and Implementation: StructureProfiler is available as part of the ProteinsPlus web service http://proteins.plus and as standalone tool in the NAOMI ChemBio Suite. Dataset updates together with the tool can be found on http://www.zbh.uni-hamburg.de/structureprofiler., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2019
- Full Text
- View/download PDF
13. Conformator: A Novel Method for the Generation of Conformer Ensembles.
- Author
-
Friedrich NO, Flachsenberg F, Meyder A, Sommer K, Kirchmair J, and Rarey M
- Subjects
- Algorithms, Cluster Analysis, Macrocyclic Compounds chemistry, Models, Molecular, Quantitative Structure-Activity Relationship, Time Factors, Drug Design, Molecular Conformation
- Abstract
Computer-aided drug design methods such as docking, pharmacophore searching, 3D database searching, and the creation of 3D-QSAR models need conformational ensembles to handle the flexibility of small molecules. Here, we present Conformator, an accurate and effective knowledge-based algorithm for generating conformer ensembles. With 99.9% of all test molecules processed, Conformator stands out by its robustness with respect to input formats, molecular geometries, and the handling of macrocycles. With an extended set of rules for sampling torsion angles, a novel algorithm for macrocycle conformer generation, and a new clustering algorithm for the assembly of conformer ensembles, Conformator reaches a median minimum root-mean-square deviation (measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers) of 0.47 Å with no significant difference to the highest-ranked commercial algorithm OMEGA and significantly higher accuracy than seven free algorithms, including the RDKit DG algorithm. Conformator is freely available for noncommercial use and academic research.
- Published
- 2019
- Full Text
- View/download PDF
14. NAOMInext - Synthetically feasible fragment growing in a structure-based design context.
- Author
-
Sommer K, Flachsenberg F, and Rarey M
- Subjects
- Crystallography, Ligands, Molecular Conformation, Molecular Docking Simulation methods, Structure-Activity Relationship, Algorithms, Drug Discovery, Workflow
- Abstract
Since decades de novo design of small molecules is intensively used and fragment-based drug discovery (FBDD) approaches still gain in popularity. Recent publications considering synthetically feasible de novo drug design underline the ongoing need for new methods. Continuous development of algorithms and tools are made, where a combination of intuitive usage, acceptable runtime, and a thoroughly evaluated workflow on large scale data sets is still a curiosity. Here, we present an intuitive approach for constrained synthetically feasible fragment growing. Starting from a fragment within its crystallized structure building blocks are attached via covalent bond formation to build up larger ligands. Iteratively, conformations are generated inside the binding site and scored to find the best suitable one. To cope with the combinatorial explosion of large flexible building blocks a novel dynamic adaptation algorithm is introduced. The technique achieves low runtimes while keeping high accuracies. The developed workflow is evaluated on a large-scale data set of 264 co-crystallized fragments with their corresponding elaborated ligands. Using our approach for fragment-based ligand growing, we were able to generate putative ligands within an RMSD of less than 2 Å compared to its crystallized structure. Additionally, we were able to show the benefit of a monolithic tethered docking like methodology compared to state of the art docking. We incorporated our method, NAOMInext, in a clearly arranged graphical user interface that assists the user by defining valuable constraints to improve and accelerate the sampling workflow. In combination with predefined synthetic reaction rules NAOMInext efficiently suggests ideas for the next generation of novel lead compounds., (Copyright © 2018 Elsevier Masson SAS. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
15. Placement of Water Molecules in Protein Structures: From Large-Scale Evaluations to Single-Case Examples.
- Author
-
Nittinger E, Flachsenberg F, Bietz S, Lange G, Klein R, and Rarey M
- Subjects
- Binding Sites, Databases, Protein, Ligands, Models, Molecular, Protein Conformation, Thermodynamics, Proteins chemistry, Water chemistry
- Abstract
Water molecules are of great importance for the correct representation of ligand binding interactions. Throughout the last years, water molecules and their integration into drug design strategies have received increasing attention. Nowadays a variety of tools are available to place and score water molecules. However, the most frequently applied software solutions require substantial computational resources. In addition, none of the existing methods has been rigorously evaluated on the basis of a large number of diverse protein complexes. Therefore, we present a novel method for placing water molecules, called WarPP, based on interaction geometries previously derived from protein crystal structures. Using a large, previously compiled, high-quality validation set of almost 1500 protein-ligand complexes containing almost 20 000 crystallographically observed water molecules in their active sites, we validated our placement strategy. We correctly placed 80% of the water molecules within 1.0 Å of a crystallographically observed one.
- Published
- 2018
- Full Text
- View/download PDF
16. Benchmarking Commercial Conformer Ensemble Generators.
- Author
-
Friedrich NO, de Bruyn Kops C, Flachsenberg F, Sommer K, Rarey M, and Kirchmair J
- Subjects
- Benchmarking, Drug Discovery, Models, Molecular, Molecular Conformation
- Abstract
We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf., Model: 2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.
- Published
- 2017
- Full Text
- View/download PDF
17. From cheminformatics to structure-based design: Web services and desktop applications based on the NAOMI library.
- Author
-
Bietz S, Inhester T, Lauck F, Sommer K, von Behren MM, Fährrolfes R, Flachsenberg F, Meyder A, Nittinger E, Otto T, Hilbig M, Schomburg KT, Volkamer A, and Rarey M
- Subjects
- Databases, Protein, Computational Biology, Internet, Software
- Abstract
Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de., (Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2017
- Full Text
- View/download PDF
18. ProteinsPlus: a web portal for structure analysis of macromolecules.
- Author
-
Fährrolfes R, Bietz S, Flachsenberg F, Meyder A, Nittinger E, Otto T, Volkamer A, and Rarey M
- Subjects
- Binding Sites, Hydrogen chemistry, Internet, Ligands, Models, Molecular, Protein Interaction Mapping, Proteins chemistry, Protein Conformation, Software
- Abstract
With currently more than 126 000 publicly available structures and an increasing growth rate, the Protein Data Bank constitutes a rich data source for structure-driven research in fields like drug discovery, crop science and biotechnology in general. Typical workflows in these areas involve manifold computational tools for the analysis and prediction of molecular functions. Here, we present the ProteinsPlus web server that offers a unified easy-to-use interface to a broad range of tools for the early phase of structure-based molecular modeling. This includes solutions for commonly required pre-processing tasks like structure quality assessment (EDIA), hydrogen placement (Protoss) and the search for alternative conformations (SIENA). Beyond that, it also addresses frequent problems as the generation of 2D-interaction diagrams (PoseView), protein-protein interface classification (HyPPI) as well as automatic pocket detection and druggablity assessment (DoGSiteScorer). The unified ProteinsPlus interface covering all featured approaches provides various facilities for intuitive input and result visualization, case-specific parameterization and download options for further processing. Moreover, its generalized workflow allows the user a quick familiarization with the different tools. ProteinsPlus also stores the calculated results temporarily for future request and thus facilitates convenient result communication and re-access. The server is freely available at http://proteins.plus., (© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2017
- Full Text
- View/download PDF
19. High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators.
- Author
-
Friedrich NO, Meyder A, de Bruyn Kops C, Sommer K, Flachsenberg F, Rarey M, and Kirchmair J
- Subjects
- Benchmarking, Ligands, Models, Molecular, Molecular Conformation, Platinum chemistry, Platinum metabolism, Proteins metabolism, Time Factors, Drug Design, Informatics methods
- Abstract
We developed a cheminformatics pipeline for the fully automated selection and extraction of high-quality protein-bound ligand conformations from X-ray structural data. The pipeline evaluates the validity and accuracy of the 3D structures of small molecules according to multiple criteria, including their fit to the electron density and their physicochemical and structural properties. Using this approach, we compiled two high-quality datasets from the Protein Data Bank (PDB): a comprehensive dataset and a diversified subset of 4626 and 2912 structures, respectively. The datasets were applied to benchmarking seven freely available conformer ensemble generators: Balloon (two different algorithms), the RDKit standard conformer ensemble generator, the Experimental-Torsion basic Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK. Substantial differences in the performance of the individual algorithms were observed, with RDKit and ETKDG generally achieving a favorable balance of accuracy, ensemble size and runtime. The Platinum datasets are available for download from http://www.zbh.uni-hamburg.de/platinum_dataset .
- Published
- 2017
- Full Text
- View/download PDF
20. RingDecomposerLib: An Open-Source Implementation of Unique Ring Families and Other Cycle Bases.
- Author
-
Flachsenberg F, Andresen N, and Rarey M
- Subjects
- Databases, Pharmaceutical, Informatics methods
- Abstract
Many cheminformatics applications like aromaticity detection, SMARTS matching, or the calculation of atomic coordinates require a chemically meaningful perception of the molecular ring topology. The unique ring families (URFs) were recently introduced as a unique, polynomial, and chemically meaningful description of the ring topology. Here we present the first open-source implementation of the URF concept for ring perception. The C library RingDecomposerLib is easy to use, portable, well-documented, and thoroughly tested. Aside from the URFs, other related ring topology descriptions like the relevant cycles (RCs), relevant cycle prototypes (RCPs), and a smallest set of smallest rings (SSSR) can be calculated. We demonstrate the runtime efficiency of the RingDecomposerLib with computing time benchmarks for the complete PubChem Compound Database and thereby show the applicability in large-scale and interactive applications.
- Published
- 2017
- Full Text
- View/download PDF
21. Feasibility of Active Machine Learning for Multiclass Compound Classification.
- Author
-
Lang T, Flachsenberg F, von Luxburg U, and Rarey M
- Subjects
- Animals, Feasibility Studies, Feedback, Humans, Drug Discovery methods, Supervised Machine Learning
- Abstract
A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.