Author: "Spjuth O" / Publication Type: Academic Journals - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Spjuth O"' showing total 133 results

Start Over Author "Spjuth O" Publication Type Academic Journals

133 results on '"Spjuth O"'

1. Cell morphology descriptors and gene ontology profiles improve prediction for mitochondrial toxicity

Author: Seal, S., Trapotsi, M.A., Puigvert, J.C., Yang, H., Spjuth, O., and Bender, A
Published: 2021
Full Text: View/download PDF

2. OpenRiskNet, an open e-infrastructure to support data sharing, knowledge integration and in silico analysis and modelling in risk assessment

Author: Exner, T.E., Dokler, J., Bachler, D., Farcal, L.R., Evelo, C.T., Willighagen, E., Jennen, D.G.J., Jabocs, M., Doganis, P., Sarimveis, H., Lynch, I., Gkoutos, G., Kramer, S., Notredame, C., Spjuth, O., Jennings, P., Dudgeon, T., Bois, F., and Hardy, B.
Published: 2018
Full Text: View/download PDF

3. Linking the Resource Description Framework to cheminformatics and proteochemometrics

Author: Willighagen Egon L, Alvarsson Jonathan, Andersson Annsofie, Eklund Martin, Lampa Samuel, Lapins Maris, Spjuth Ola, and Wikberg Jarl ES
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Abstract Background Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation. Results The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC50 and Ki values are modeled for a number of biological targets using data from the ChEMBL database. Conclusions We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.
Published: 2011
Full Text: View/download PDF

4. Computational toxicology using the OpenTox application programming interface and Bioclipse

Author: Willighagen Egon L, Jeliazkova Nina, Hardy Barry, Grafström Roland C, and Spjuth Ola
Subjects: Medicine, Biology (General), QH301-705.5, Science (General), Q1-390
Abstract: Abstract Background Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications. Findings This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources. Conclusions A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers.
Published: 2011
Full Text: View/download PDF

5. Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on

Author: O'Boyle Noel M, Guha Rajarshi, Willighagen Egon L, Adams Samuel E, Alvarsson Jonathan, Bradley Jean-Claude, Filippov Igor V, Hanson Robert M, Hanwell Marcus D, Hutchison Geoffrey R, James Craig A, Jeliazkova Nina, Lang Andrew SID, Langner Karol M, Lonie David C, Lowe Daniel M, Pansanel Jérôme, Pavlov Dmitry, Spjuth Ola, Steinbeck Christoph, Tenderholt Adam L, Theisen Kevin J, and Murray-Rust Peter
Subjects: Information technology, T58.5-58.64, Chemistry, QD1-999
Abstract: Abstract Background The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards. Results This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. Conclusions We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.
Published: 2011
Full Text: View/download PDF

6. Brunn: An open source laboratory information system for microplates with a graphical plate layout design process

Author: Larsson Rolf, Spjuth Ola, Andersson Claes, Alvarsson Jonathan, and Wikberg Jarl ES
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data. Results A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves. Conclusions Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.
Published: 2011
Full Text: View/download PDF

7. Use of historic metabolic biotransformation data as a means of anticipating metabolic sites using MetaPrint2D and Bioclipse

Author: Glen Robert C, Adams Samuel, Spjuth Ola, Carlsson Lars, and Boyer Scott
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive. Results Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics. Conclusions Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.
Published: 2010
Full Text: View/download PDF

8. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

Author: Spjuth Ola, Willighagen Egon L, Guha Rajarshi, Eklund Martin, and Wikberg Jarl ES
Subjects: Information technology, T58.5-58.64, Chemistry, QD1-999
Abstract: Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community.
Published: 2010
Full Text: View/download PDF

9. An eScience-Bayes strategy for analyzing omics data

Author: Spjuth Ola, Eklund Martin, and Wikberg Jarl ES
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems. Results We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.
Published: 2010
Full Text: View/download PDF

10. Bioclipse 2: A scriptable integration platform for the life sciences

Author: Wagener Johannes, Torrance Gilleain, Mäsak Carl, Kuhn Stefan, Eklund Martin, Berg Arvid, Alvarsson Jonathan, Spjuth Ola, Willighagen Egon L, Steinbeck Christoph, and Wikberg Jarl ES
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Contemporary biological research integrates neighboring scientific domains to answer complex questions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks. Results Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments. Conclusion Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today's powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. The fact that Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, making it easy to add new algorithms, visualizations, as well as scripting commands. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.
Published: 2009
Full Text: View/download PDF

11. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services

Author: Willighagen Egon L, Spjuth Ola, Wagener Johannes, and Wikberg Jarl ES
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.
Published: 2009
Full Text: View/download PDF

12. The C1C2: A framework for simultaneous model selection and assessment

Author: Wikberg Jarl ES, Spjuth Ola, and Eklund Martin
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. Results The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. Conclusion The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.
Published: 2008
Full Text: View/download PDF

13. Proteochemometric modeling of HIV protease susceptibility

Author: Prusis Peteris, Spjuth Ola, Eklund Martin, Lapins Maris, and Wikberg Jarl ES
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations. Results The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72. Conclusion Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.
Published: 2008
Full Text: View/download PDF

14. Bioclipse: an open source workbench for chemo- and bioinformatics

Author: Wagener Johannes, Eklund Martin, Kuhn Stefan, Willighagen Egon L, Helmus Tobias, Spjuth Ola, Murray-Rust Peter, Steinbeck Christoph, and Wikberg Jarl ES
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no sucessful attempts have been made to integrate chemo- and bioinformatics into a single framework. Results Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. Conclusion Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.
Published: 2007
Full Text: View/download PDF

15. Improved Detection of Drug-Induced Liver Injury by Integrating Predicted In Vivo and In Vitro Data.

Author: Seal S, Williams D, Hosseini-Gerami L, Mahale M, Carpenter AE, Spjuth O, and Bender A
Subjects: Humans, Animals, Rats, Chemical and Drug Induced Liver Injury
Abstract: Drug-induced liver injury (DILI) has been a significant challenge in drug discovery, often leading to clinical trial failures and necessitating drug withdrawals. Over the last decade, the existing suite of in vitro proxy-DILI assays has generally improved at identifying compounds with hepatotoxicity. However, there is considerable interest in enhancing the in silico prediction of DILI because it allows for evaluating large sets of compounds more quickly and cost-effectively, particularly in the early stages of projects. In this study, we aim to study ML models for DILI prediction that first predict nine proxy-DILI labels and then use them as features in addition to chemical structural features to predict DILI. The features include in vitro (e.g., mitochondrial toxicity, bile salt export pump inhibition) data, in vivo (e.g., preclinical rat hepatotoxicity studies) data, pharmacokinetic parameters of maximum concentration, structural fingerprints, and physicochemical parameters. We trained DILI-prediction models on 888 compounds from the DILI data set (composed of DILIst and DILIrank) and tested them on a held-out external test set of 223 compounds from the DILI data set. The best model, DILIPredictor, attained an AUC-PR of 0.79. This model enabled the detection of the top 25 toxic compounds (2.68 LR+, positive likelihood ratio) compared to models using only structural features (1.65 LR+ score). Using feature interpretation from DILIPredictor, we identified the chemical substructures causing DILI and differentiated cases of DILI caused by compounds in animals but not in humans. For example, DILIPredictor correctly recognized 2-butoxyethanol as nontoxic in humans despite its hepatotoxicity in mice models. Overall, the DILIPredictor model improves the detection of compounds causing DILI with an improved differentiation between animal and human sensitivity and the potential for mechanism evaluation. DILIPredictor required only chemical structures as input for prediction and is publicly available at https://broad.io/DILIPredictor for use via web interface and with all code available for download.
Published: 2024
Full Text: View/download PDF

16. Artificial intelligence for high content imaging in drug discovery.

Author: Carreras-Puigvert J and Spjuth O
Subjects: Humans, Drug Discovery methods, Artificial Intelligence
Abstract: Artificial intelligence (AI) and high-content imaging (HCI) are contributing to advancements in drug discovery, propelled by the recent progress in deep neural networks. This review highlights AI's role in analysis of HCI data from fixed and live-cell imaging, enabling novel label-free and multi-channel fluorescent screening methods, and improving compound profiling. HCI experiments are rapid and cost-effective, facilitating large data set accumulation for AI model training. However, the success of AI in drug discovery also depends on high-quality data, reproducible experiments, and robust validation to ensure model performance. Despite challenges like the need for annotated compounds and managing vast image data, AI's potential in phenotypic screening and drug profiling is significant. Future improvements in AI, including increased interpretability and integration of multiple modalities, are expected to solidify AI and HCI's role in drug discovery., Competing Interests: Declaration of competing interest OS and JCP declare ownership in Phenaros Pharmaceuticals AB, a company exploiting AI, automation and HCI for drug discovery., (Copyright © 2024 The Author(s). Published by Elsevier Ltd.. All rights reserved.)
Published: 2024
Full Text: View/download PDF

17. CPSign: conformal prediction for cheminformatics modeling.

Author: Arvidsson McShane S, Norinder U, Alvarsson J, Ahlberg E, Carlsson L, and Spjuth O
Abstract: Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign .Scientific contribution CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance-showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

18. A Decade in a Systematic Review: The Evolution and Impact of Cell Painting.

Author: Seal S, Trapotsi MA, Spjuth O, Singh S, Carreras-Puigvert J, Greene N, Bender A, and Carpenter AE
Abstract: High-content image-based assays have fueled significant discoveries in the life sciences in the past decade (2013-2023), including novel insights into disease etiology, mechanism of action, new therapeutics, and toxicology predictions. Here, we systematically review the substantial methodological advancements and applications of Cell Painting. Advancements include improvements in the Cell Painting protocol, assay adaptations for different types of perturbations and applications, and improved methodologies for feature extraction, quality control, and batch effect correction. Moreover, machine learning methods recently surpassed classical approaches in their ability to extract biologically useful information from Cell Painting images. Cell Painting data have been used alone or in combination with other - omics data to decipher the mechanism of action of a compound, its toxicity profile, and many other biological effects. Overall, key methodological advances have expanded Cell Painting's ability to capture cellular responses to various perturbations. Future advances will likely lie in advancing computational and experimental techniques, developing new publicly available datasets, and integrating them with other high-content data types., Competing Interests: The authors declare the following competing financial interest(s): S. Singh and A.E.C. serve as scientific advisors for companies that use image-based profiling and Cell Painting (A.E.C.: Recursion, SyzOnc, Quiver Bioscience; S. Singh: Waypoint Bio, Dewpoint Therapeutics, DeepCell) and receive honoraria for occasional talks at pharmaceutical and biotechnology companies. J.C.P. and O.S. declare ownership in Phenaros Pharmaceuticals.
Published: 2024
Full Text: View/download PDF

19. New approach methods to assess developmental and adult neurotoxicity for regulatory use: a PARC work package 5 project.

Author: Tal T, Myhre O, Fritsche E, Rüegg J, Craenen K, Aiello-Holden K, Agrillo C, Babin PJ, Escher BI, Dirven H, Hellsten K, Dolva K, Hessel E, Heusinkveld HJ, Hadzhiev Y, Hurem S, Jagiello K, Judzinska B, Klüver N, Knoll-Gellida A, Kühne BA, Leist M, Lislien M, Lyche JL, Müller F, Colbourne JK, Neuhaus W, Pallocca G, Seeger B, Scharkin I, Scholz S, Spjuth O, Torres-Ruiz M, and Bartmann K
Abstract: In the European regulatory context, rodent in vivo studies are the predominant source of neurotoxicity information. Although they form a cornerstone of neurotoxicological assessments, they are costly and the topic of ethical debate. While the public expects chemicals and products to be safe for the developing and mature nervous systems, considerable numbers of chemicals in commerce have not, or only to a limited extent, been assessed for their potential to cause neurotoxicity. As such, there is a societal push toward the replacement of animal models with in vitro or alternative methods. New approach methods (NAMs) can contribute to the regulatory knowledge base, increase chemical safety, and modernize chemical hazard and risk assessment. Provided they reach an acceptable level of regulatory relevance and reliability, NAMs may be considered as replacements for specific in vivo studies. The European Partnership for the Assessment of Risks from Chemicals (PARC) addresses challenges to the development and implementation of NAMs in chemical risk assessment. In collaboration with regulatory agencies, Project 5.2.1e (Neurotoxicity) aims to develop and evaluate NAMs for developmental neurotoxicity (DNT) and adult neurotoxicity (ANT) and to understand the applicability domain of specific NAMs for the detection of endocrine disruption and epigenetic perturbation. To speed up assay time and reduce costs, we identify early indicators of later-onset effects. Ultimately, we will assemble second-generation developmental neurotoxicity and first-generation adult neurotoxicity test batteries, both of which aim to provide regulatory hazard and risk assessors and industry stakeholders with robust, speedy, lower-cost, and informative next-generation hazard and risk assessment tools., Competing Interests: EF and KB are shareholders of the DNTOX GmbH offering neurotoxicity testing services. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2024 Tal, Myhre, Fritsche, Rüegg, Craenen, Aiello-Holden, Agrillo, Babin, Escher, Dirven, Hellsten, Dolva, Hessel, Heusinkveld, Hadzhiev, Hurem, Jagiello, Judzinska, Klüver, Knoll-Gellida, Kühne, Leist, Lislien, Lyche, Müller, Colbourne, Neuhaus, Pallocca, Seeger, Scharkin, Scholz, Spjuth, Torres-Ruiz and Bartmann.)
Published: 2024
Full Text: View/download PDF

20. Corrigendum: Development of new approach methods for the identification and characterization of endocrine metabolic disruptors-a PARC project.

Author: Braeuning A, Balaguer P, Bourguet W, Carreras-Puigvert J, Feiertag K, Kamstra JH, Knapen D, Lichtenstein D, Marx-Stoelting P, Rietdijk J, Schubert K, Spjuth O, Stinckens E, Thedieck K, van den Boom R, Vergauwen L, von Bergen M, Wewer N, and Zalko D
Abstract: [This corrects the article DOI: 10.3389/ftox.2023.1212509.]., (Copyright © 2024 Braeuning, Balaguer, Bourguet, Carreras-Puigvert, Feiertag, Kamstra, Knapen, Lichtenstein, Marx-Stoelting, Rietdijk, Schubert, Spjuth, Stinckens, Thedieck, van den Boom, Vergauwen, von Bergen, Wewer and Zalko.)
Published: 2024
Full Text: View/download PDF

21. From pixels to phenotypes: Integrating image-based profiling with cell health data as BioMorph features improves interpretability.

Author: Seal S, Carreras-Puigvert J, Singh S, Carpenter AE, Spjuth O, and Bender A
Subjects: Phenotype, DNA Replication, Software
Abstract: Cell Painting assays generate morphological profiles that are versatile descriptors of biological systems and have been used to predict in vitro and in vivo drug effects. However, Cell Painting features extracted from classical software such as CellProfiler are based on statistical calculations and often not readily biologically interpretable. In this study, we propose a new feature space, which we call BioMorph, that maps these Cell Painting features with readouts from comprehensive Cell Health assays. We validated that the resulting BioMorph space effectively connected compounds not only with the morphological features associated with their bioactivity but with deeper insights into phenotypic characteristics and cellular processes associated with the given bioactivity. The BioMorph space revealed the mechanism of action for individual compounds, including dual-acting compounds such as emetine, an inhibitor of both protein synthesis and DNA replication. Overall, BioMorph space offers a biologically relevant way to interpret the cell morphological features derived using software such as CellProfiler and to generate hypotheses for experimental validation.
Published: 2024
Full Text: View/download PDF

22. Insights into Drug Cardiotoxicity from Biological and Chemical Data: The First Public Classifiers for FDA Drug-Induced Cardiotoxicity Rank.

Author: Seal S, Spjuth O, Hosseini-Gerami L, García-Ortegón M, Singh S, Bender A, and Carpenter AE
Subjects: Humans, Cardiotoxicity etiology, Cardiotoxicity metabolism, Drug Development
Abstract: Drug-induced cardiotoxicity (DICT) is a major concern in drug development, accounting for 10-14% of postmarket withdrawals. In this study, we explored the capabilities of chemical and biological data to predict cardiotoxicity, using the recently released DICTrank data set from the United States FDA. We found that such data, including protein targets, especially those related to ion channels (e.g., hERG), physicochemical properties (e.g., electrotopological state), and peak concentration in plasma offer strong predictive ability for DICT. Compounds annotated with mechanisms of action such as cyclooxygenase inhibition could distinguish between most-concern and no-concern DICT. Cell Painting features for ER stress discerned most-concern cardiotoxic from nontoxic compounds. Models based on physicochemical properties provided substantial predictive accuracy (AUCPR = 0.93). With the availability of omics data in the future, using biological data promises enhanced predictability and deeper mechanistic insights, paving the way for safer drug development. All models from this study are available at https://broad.io/DICTrank_Predictor.
Published: 2024
Full Text: View/download PDF

23. Insights into Drug Cardiotoxicity from Biological and Chemical Data: The First Public Classifiers for FDA DICTrank.

Author: Seal S, Spjuth O, Hosseini-Gerami L, García-Ortegón M, Singh S, Bender A, and Carpenter AE
Abstract: Drug-induced cardiotoxicity (DICT) is a major concern in drug development, accounting for 10-14% of postmarket withdrawals. In this study, we explored the capabilities of various chemical and biological data to predict cardiotoxicity, using the recently released Drug-Induced Cardiotoxicity Rank (DICTrank) dataset from the United States FDA. We analyzed a diverse set of data sources, including physicochemical properties, annotated mechanisms of action (MOA), Cell Painting, Gene Expression, and more, to identify indications of cardiotoxicity. We found that such data, including protein targets, especially those related to ion channels (such as hERG), physicochemical properties (such as electrotopological state) as well as peak concentration in plasma offer strong predictive ability as well as valuable insights into DICT. We also found compounds annotated with particular mechanisms of action, such as cyclooxygenase inhibition, could distinguish between most-concern and no-concern DICT compounds. Cell Painting features related to ER stress discern the most-concern cardiotoxic compounds from non-toxic compounds. While models based on physicochemical properties currently provide substantial predictive accuracy (AUCPR = 0.93), this study also underscores the potential benefits of incorporating more comprehensive biological data in future DICT predictive models. With the availability of - omics data in the future, using biological data promises enhanced predictability and delivers deeper mechanistic insights, paving the way for safer therapeutic drug development. All models and data used in this study are publicly released at https://broad.io/DICTrank_Predictor., Competing Interests: Author Declarations S Singh and AEC serve as scientific advisors for companies that use image-based profiling and Cell Painting (AEC: Recursion, SyzOnc, S Singh: Waypoint Bio, Dewpoint Therapeutics) and receive honoraria for occasional talks at pharmaceutical and biotechnology companies. OS declares shares in Phenaros Pharmaceuticals. LGH is an employee at Ignota Labs where CellScape is a proprietary software. All other authors declare no relevant competing interests.
Published: 2023
Full Text: View/download PDF

24. ELIXIR and Toxicology: a community in development.

Author: Martens M, Stierum R, Schymanski EL, Evelo CT, Aalizadeh R, Aladjov H, Arturi K, Audouze K, Babica P, Berka K, Bessems J, Blaha L, Bolton EE, Cases M, Damalas DΕ, Dave K, Dilger M, Exner T, Geerke DP, Grafström R, Gray A, Hancock JM, Hollert H, Jeliazkova N, Jennen D, Jourdan F, Kahlem P, Klanova J, Kleinjans J, Kondic T, Kone B, Lynch I, Maran U, Martinez Cuesta S, Ménager H, Neumann S, Nymark P, Oberacher H, Ramirez N, Remy S, Rocca-Serra P, Salek RM, Sallach B, Sansone SA, Sanz F, Sarimveis H, Sarntivijai S, Schulze T, Slobodnik J, Spjuth O, Tedds J, Thomaidis N, Weber RJM, van Westen GJP, Wheelock CE, Williams AJ, Witters H, Zdrazil B, Županič A, and Willighagen EL
Subjects: Europe, Risk Assessment, Biological Science Disciplines
Abstract: Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease-specific observational models into target-specific predictive models. Traditionally, toxicology has strong links with other fields such as biology, chemistry, pharmacology and medicine. With the rise of synthetic and new engineered materials, alongside ongoing prioritisation needs in chemical risk assessment for existing chemicals, early predictive evaluations are becoming of utmost importance to both scientific and regulatory purposes. ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe. To coordinate the linkage of various life science efforts around modern predictive toxicology, the establishment of a new ELIXIR Community is seen as instrumental. In the past few years, joint efforts, building on incidental overlap, have been piloted in the context of ELIXIR. For example, the EU-ToxRisk, diXa, HeCaToS, transQST, and the nanotoxicology community have worked with the ELIXIR TeSS, Bioschemas, and Compute Platforms and activities. In 2018, a core group of interested parties wrote a proposal, outlining a sketch of what this new ELIXIR Toxicology Community would look like. A recent workshop (held September 30th to October 1st, 2020) extended this into an ELIXIR Toxicology roadmap and a shortlist of limited investment-high gain collaborations to give body to this new community. This Whitepaper outlines the results of these efforts and defines our vision of the ELIXIR Toxicology Community and how it complements other ELIXIR activities., Competing Interests: No competing interests were disclosed., (Copyright: © 2023 Martens M et al.)
Published: 2023
Full Text: View/download PDF

25. Author Correction: A method for Boolean analysis of protein interactions at a molecular level.

Author: Raykova D, Kermpatsou D, Malmqvist T, Harrison PJ, Sander MR, Stiller C, Heldin J, Leino M, Ricardo S, Klemm A, David L, Spjuth O, Vemuri K, Dimberg A, Sundqvist A, Norlin M, Klaesson A, Kampf C, and Söderberg O
Published: 2023
Full Text: View/download PDF

26. Evaluating the utility of brightfield image data for mechanism of action prediction.

Author: Harrison PJ, Gupta A, Rietdijk J, Wieslander H, Carreras-Puigvert J, Georgiev P, Wählby C, Spjuth O, and Sintorn IM
Subjects: Microscopy, Fluorescence methods, Cells, Cultured, Image Processing, Computer-Assisted methods
Abstract: Fluorescence staining techniques, such as Cell Painting, together with fluorescence microscopy have proven invaluable for visualizing and quantifying the effects that drugs and other perturbations have on cultured cells. However, fluorescence microscopy is expensive, time-consuming, labor-intensive, and the stains applied can be cytotoxic, interfering with the activity under study. The simplest form of microscopy, brightfield microscopy, lacks these downsides, but the images produced have low contrast and the cellular compartments are difficult to discern. Nevertheless, by harnessing deep learning, these brightfield images may still be sufficient for various predictive purposes. In this study, we compared the predictive performance of models trained on fluorescence images to those trained on brightfield images for predicting the mechanism of action (MoA) of different drugs. We also extracted CellProfiler features from the fluorescence images and used them to benchmark the performance. Overall, we found comparable and largely correlated predictive performance for the two imaging modalities. This is promising for future studies of MoAs in time-lapse experiments for which using fluorescence images is problematic. Explorations based on explainable AI techniques also provided valuable insights regarding compounds that were better predicted by one modality over the other., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2023 Harrison et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2023
Full Text: View/download PDF

27. Development of new approach methods for the identification and characterization of endocrine metabolic disruptors-a PARC project.

Author: Braeuning A, Balaguer P, Bourguet W, Carreras-Puigvert J, Feiertag K, Kamstra JH, Knapen D, Lichtenstein D, Marx-Stoelting P, Rietdijk J, Schubert K, Spjuth O, Stinckens E, Thedieck K, van den Boom R, Vergauwen L, von Bergen M, Wewer N, and Zalko D
Abstract: In past times, the analysis of endocrine disrupting properties of chemicals has mainly been focused on (anti-)estrogenic or (anti-)androgenic properties, as well as on aspects of steroidogenesis and the modulation of thyroid signaling. More recently, disruption of energy metabolism and related signaling pathways by exogenous substances, so-called metabolism-disrupting chemicals (MDCs) have come into focus. While general effects such as body and organ weight changes are routinely monitored in animal studies, there is a clear lack of mechanistic test systems to determine and characterize the metabolism-disrupting potential of chemicals. In order to contribute to filling this gap, one of the project within EU-funded Partnership for the Assessment of Risks of Chemicals (PARC) aims at developing novel in vitro methods for the detection of endocrine metabolic disruptors. Efforts will comprise projects related to specific signaling pathways, for example, involving mTOR or xenobiotic-sensing nuclear receptors, studies on hepatocytes, adipocytes and pancreatic beta cells covering metabolic and morphological endpoints, as well as metabolism-related zebrafish-based tests as an alternative to classic rodent bioassays. This paper provides an overview of the approaches and methods of these PARC projects and how this will contribute to the improvement of the toxicological toolbox to identify substances with endocrine disrupting properties and to decipher their mechanisms of action., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The handling editor TV declared a shared research group PARC PROJECT: European Partnership on the Assessment of Risks from Chemicals (PARC) with the authors at the time of review., (Copyright © 2023 Braeuning, Balaguer, Bourguet, Carreras-Puigvert, Feiertag, Kamstra, Knapen, Lichtenstein, Marx-Stoelting, Rietdijk, Schubert, Spjuth, Stinckens, Thedieck, van den Boom, Vergauwen, von Bergen, Wewer and Zalko.)
Published: 2023
Full Text: View/download PDF

28. Merging bioactivity predictions from cell morphology and chemical fingerprint models using similarity to training data.

Author: Seal S, Yang H, Trapotsi MA, Singh S, Carreras-Puigvert J, Spjuth O, and Bender A
Abstract: The applicability domain of machine learning models trained on structural fingerprints for the prediction of biological endpoints is often limited by the lack of diversity of chemical space of the training data. In this work, we developed similarity-based merger models which combined the outputs of individual models trained on cell morphology (based on Cell Painting) and chemical structure (based on chemical fingerprints) and the structural and morphological similarities of the compounds in the test dataset to compounds in the training dataset. We applied these similarity-based merger models using logistic regression models on the predictions and similarities as features and predicted assay hit calls of 177 assays from ChEMBL, PubChem and the Broad Institute (where the required Cell Painting annotations were available). We found that the similarity-based merger models outperformed other models with an additional 20% assays (79 out of 177 assays) with an AUC > 0.70 compared with 65 out of 177 assays using structural models and 50 out of 177 assays using Cell Painting models. Our results demonstrated that similarity-based merger models combining structure and cell morphology models can more accurately predict a wide range of biological assay outcomes and further expanded the applicability domain by better extrapolating to new structural and morphology spaces., (© 2023. The Author(s).)
Published: 2023
Full Text: View/download PDF

29. Disease phenotype prediction in multiple sclerosis.

Author: Herman S, Arvidsson McShane S, Zjukovskaja C, Khoonsari PE, Svenningsson A, Burman J, Spjuth O, and Kultima K
Abstract: Progressive multiple sclerosis (PMS) is currently diagnosed retrospectively. Here, we work toward a set of biomarkers that could assist in early diagnosis of PMS. A selection of cerebrospinal fluid metabolites (n = 15) was shown to differentiate between PMS and its preceding phenotype in an independent cohort (AUC = 0.93). Complementing the classifier with conformal prediction showed that highly confident predictions could be made, and that three out of eight patients developing PMS within three years of sample collection were predicted as PMS at that time point. Finally, this methodology was applied to PMS patients as part of a clinical trial for intrathecal treatment with rituximab. The methodology showed that 68% of the patients decreased their similarity to the PMS phenotype one year after treatment. In conclusion, the inclusion of confidence predictors contributes with more information compared to traditional machine learning, and this information is relevant for disease monitoring., Competing Interests: The authors declare no competing interests., (© 2023 The Authors.)
Published: 2023
Full Text: View/download PDF

30. In Silico Prediction of Human Clinical Pharmacokinetics with ANDROMEDA by Prosilico: Predictions for an Established Benchmarking Data Set, a Modern Small Drug Data Set, and a Comparison with Laboratory Methods.

Author: Fagerholm U, Hellberg S, Alvarsson J, and Spjuth O
Subjects: Animals, Humans, Permeability, Pharmacokinetics, Pharmaceutical Preparations, Computer Simulation, Benchmarking, Models, Biological
Abstract: There is an ongoing aim to replace animal and in vitro laboratory models with in silico methods. Such replacement requires the successful validation and comparably good performance of the alternative methods. We have developed an in silico prediction system for human clinical pharmacokinetics, based on machine learning, conformal prediction and a new physiologically-based pharmacokinetic model, i.e. ANDROMEDA. The objectives of this study were: a) to evaluate how well ANDROMEDA predicts the human clinical pharmacokinetics of a previously proposed benchmarking data set comprising 24 physicochemically diverse drugs and 28 small drug molecules new to the market in 2021; b) to compare its predictive performance with that of laboratory methods; and c) to investigate and describe the pharmacokinetic characteristics of the modern drugs. Median and maximum prediction errors for the selected major parameters were ca 1.2 to 2.5-fold and 16-fold for both data sets, respectively. Prediction accuracy was on par with, or better than, the best laboratory-based prediction methods (superior performance for a vast majority of the comparisons), and the prediction range was considerably broader. The modern drugs have higher average molecular weight than those in the benchmarking set from 15 years earlier ( ca 200 g/mol higher), and were predicted to (generally) have relatively complex pharmacokinetics, including permeability and dissolution limitations and significant renal, biliary and/or gut-wall elimination. In conclusion, the results were overall better than those obtained with laboratory methods, and thus serve to further validate the ANDROMEDA in silico system for the prediction of human clinical pharmacokinetics of modern and physicochemically diverse drugs.
Published: 2023
Full Text: View/download PDF

31. Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction.

Author: Olsson H, Kartasalo K, Mulliqi N, Capuccini M, Ruusuvuori P, Samaratunga H, Delahunt B, Lindskog C, Janssen EAM, Blilie A, Egevad L, Spjuth O, and Eklund M
Subjects: Male, Humans, Uncertainty, Prostate, Biopsy, Artificial Intelligence, Neoplasms
Abstract: Unreliable predictions can occur when an artificial intelligence (AI) system is presented with data it has not been exposed to during training. We demonstrate the use of conformal prediction to detect unreliable predictions, using histopathological diagnosis and grading of prostate biopsies as example. We digitized 7788 prostate biopsies from 1192 men in the STHLM3 diagnostic study, used for training, and 3059 biopsies from 676 men used for testing. With conformal prediction, 1 in 794 (0.1%) predictions is incorrect for cancer diagnosis (compared to 14 errors [2%] without conformal prediction) while 175 (22%) of the predictions are flagged as unreliable when the AI-system is presented with new data from the same lab and scanner that it was trained on. Conformal prediction could with small samples (N = 49 for external scanner, N = 10 for external lab and scanner, and N = 12 for external lab, scanner and pathology assessment) detect systematic differences in external data leading to worse predictive performance. The AI-system with conformal prediction commits 3 (2%) errors for cancer detection in cases of atypical prostate tissue compared to 44 (25%) without conformal prediction, while the system flags 143 (80%) unreliable predictions. We conclude that conformal prediction can increase patient safety of AI-systems., (© 2022. The Author(s).)
Published: 2022
Full Text: View/download PDF

32. From biomedical cloud platforms to microservices: next steps in FAIR data and analysis.

Author: Sheffield NC, Bonazzi VR, Bourne PE, Burdett T, Clark T, Grossman RL, Spjuth O, and Yates AD
Published: 2022
Full Text: View/download PDF

33. The Impact of Reference Data Selection for the Prediction Accuracy of Intrinsic Hepatic Metabolic Clearance.

Author: Fagerholm U, Spjuth O, and Hellberg S
Subjects: Humans, Kinetics, Metabolic Clearance Rate, Microsomes, Liver metabolism, Hepatocytes metabolism, Liver metabolism
Abstract: In vitro-in vivo prediction results for hepatic metabolic clearance (CL H ) and intrinsic CL H (CL int ) vary widely among studies. Reasons are not fully investigated and understood. The possibility to select favorable reference data for in vivo CL H and CL int and unbound fraction in plasma (f u ) is among possible explanations. The main objective was to investigate how reference data selection influences log in vitro and in vivo CL int -correlations (r 2 ). Another aim was to make a head-to-head comparison vs an in silico prediction method. Human hepatocyte CL int -data for 15 compounds from two studies were selected. These were correlated to in vivo CL int estimated using different reported CL H - and f u -estimates. Depending on the choice of reference data, r 2 from two studies were 0.07 to 0.86 and 0.06 to 0.79. When using average reference estimates a r 2 of 0.62 was achieved. Inclusion of two outliers in one of the studies resulted in a r 2 of 0.38, which was lower than the predictive accuracy (q 2 ) for the in silico method (0.48). In conclusion, the selection of reference data appears to play a major role for demonstrated predictions and the in silico method showed higher accuracy and wider range than hepatocytes for human in vivo CL int -predictions., Competing Interests: Declaration of Competing Interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022. Published by Elsevier Inc.)
Published: 2022
Full Text: View/download PDF

34. In Silico Predictions of the Gastrointestinal Uptake of Macrocycles in Man Using Conformal Prediction Methodology.

Author: Fagerholm U, Hellberg S, Alvarsson J, and Spjuth O
Subjects: Administration, Oral, Caco-2 Cells, Computer Simulation, Humans, Permeability, Pharmaceutical Preparations, Solubility, Intestinal Absorption, Models, Biological
Abstract: The gastrointestinal uptake of macrocyclic compounds is not fully understood. Here we applied our previously validated integrated system based on machine learning and conformal prediction to predict the passive fraction absorbed (f a ), maximum fraction dissolved (f diss ), substrate specificities for major efflux transporters and total fraction absorbed (f a,tot ) for a selected set of designed macrocyclic compounds (n = 37; MW 407-889 g/mol) and macrocyclic drugs (n = 16; MW 734-1203 g/mole) in vivo in man. Major aims were to increase the understanding of oral absorption of macrocycles and further validate our methodology. We predicted designed macrocycles to have high f a and low to high f diss and f a,tot , and average estimates were higher than for the larger macrocyclic drugs. With few exceptions, compounds were predicted to be effluxed and well absorbed. A 2-fold median prediction error for f a,tot was achieved for macrocycles (validation set). Advantages with our methodology include that it enables predictions for macrocycles with low permeability, Caco-2 recovery and solubility (BCS IV), and provides prediction intervals and guides optimization of absorption. The understanding of oral absorption of macrocycles was increased and the methodology was validated for prediction of the uptake of macrocycles in man., Competing Interests: Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Urban Fagerholm, Sven Hellberg and Ola Spjuth declare shares in Prosilico AB, a Swedish company that develops solutions for human clinical ADME/PK predictions. Ola Spjuth declares shares in Aros Bio AB, a company developing the CPSign software., (Copyright © 2022 American Pharmacists Association. Published by Elsevier Inc. All rights reserved.)
Published: 2022
Full Text: View/download PDF

35. Integrating cell morphology with gene expression and chemical structure to aid mitochondrial toxicity detection.

Author: Seal S, Carreras-Puigvert J, Trapotsi MA, Yang H, Spjuth O, and Bender A
Subjects: Gene Expression, Biological Assay, Drug Discovery methods
Abstract: Mitochondrial toxicity is an important safety endpoint in drug discovery. Models based solely on chemical structure for predicting mitochondrial toxicity are currently limited in accuracy and applicability domain to the chemical space of the training compounds. In this work, we aimed to utilize both -omics and chemical data to push beyond the state-of-the-art. We combined Cell Painting and Gene Expression data with chemical structural information from Morgan fingerprints for 382 chemical perturbants tested in the Tox21 mitochondrial membrane depolarization assay. We observed that mitochondrial toxicants differ from non-toxic compounds in morphological space and identified compound clusters having similar mechanisms of mitochondrial toxicity, thereby indicating that morphological space provides biological insights related to mechanisms of action of this endpoint. We further showed that models combining Cell Painting, Gene Expression features and Morgan fingerprints improved model performance on an external test set of 244 compounds by 60% (in terms of F1 score) and improved extrapolation to new chemical space. The performance of our combined models was comparable with dedicated in vitro assays for mitochondrial toxicity. Our results suggest that combining chemical descriptors with biological readouts enhances the detection of mitochondrial toxicants, with practical implications in drug discovery., (© 2022. The Author(s).)
Published: 2022
Full Text: View/download PDF

36. A method for Boolean analysis of protein interactions at a molecular level.

Author: Raykova D, Kermpatsou D, Malmqvist T, Harrison PJ, Sander MR, Stiller C, Heldin J, Leino M, Ricardo S, Klemm A, David L, Spjuth O, Vemuri K, Dimberg A, Sundqvist A, Norlin M, Klaesson A, Kampf C, and Söderberg O
Subjects: Signal Transduction, Protein Interaction Mapping methods, Proteins metabolism
Abstract: Determining the levels of protein-protein interactions is essential for the analysis of signaling within the cell, characterization of mutation effects, protein function and activation in health and disease, among others. Herein, we describe MolBoolean - a method to detect interactions between endogenous proteins in various subcellular compartments, utilizing antibody-DNA conjugates for identification and signal amplification. In contrast to proximity ligation assays, MolBoolean simultaneously indicates the relative abundances of protein A and B not interacting with each other, as well as the pool of A and B proteins that are proximal enough to be considered an AB complex. MolBoolean is applicable both in fixed cells and tissue sections. The specific and quantifiable data that the method generates provide opportunities for both diagnostic use and medical research., (© 2022. The Author(s).)
Published: 2022
Full Text: View/download PDF

37. Morphological profiling of environmental chemicals enables efficient and untargeted exploration of combination effects.

Author: Rietdijk J, Aggarwal T, Georgieva P, Lapins M, Carreras-Puigvert J, and Spjuth O
Subjects: Cetrimonium, Humans, Benzhydryl Compounds toxicity
Abstract: Environmental chemicals are commonly studied one at a time, and there is a need to advance our understanding of the effect of exposure to their combinations. Here we apply high-content microscopy imaging of cells stained with multiplexed dyes (Cell Painting) to profile the effects of Cetyltrimethylammonium bromide (CTAB), Bisphenol A (BPA), and Dibutyltin dilaurate (DBTDL) exposure on four human cell lines; both individually and in all combinations. We show that morphological features can be used with multivariate data analysis to discern between exposures from individual compounds, concentrations, and combinations. CTAB and DBTDL induced concentration-dependent morphological changes across the four cell lines, and BPA exacerbated morphological effects when combined with CTAB and DBTDL. Combined exposure to CTAB and BPA induced changes in the ER, Golgi apparatus, nucleoli and cytoplasmic RNA in one of the cell lines. Different responses between cell lines indicate that multiple cell types are needed when assessing combination effects. The rapid and relatively low-cost experiments combined with high information content make Cell Painting an attractive methodology for future studies of combination effects. All data in the study is made publicly available on Figshare., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022 The Authors. Published by Elsevier B.V. All rights reserved.)
Published: 2022
Full Text: View/download PDF

38. SimVec: predicting polypharmacy side effects for new drugs.

Author: Lukashina N, Kartysheva E, Spjuth O, Virko E, and Shpilman A
Abstract: Polypharmacy refers to the administration of multiple drugs on a daily basis. It has demonstrated effectiveness in treating many complex diseases , but it has a higher risk of adverse drug reactions. Hence, the prediction of polypharmacy side effects is an essential step in drug testing, especially for new drugs. This paper shows that the current knowledge graph (KG) based state-of-the-art approach to polypharmacy side effect prediction does not work well for new drugs, as they have a low number of known connections in the KG. We propose a new method , SimVec, that solves this problem by enhancing the KG structure with a structure-aware node initialization and weighted drug similarity edges. We also devise a new 3-step learning process, which iteratively updates node embeddings related to side effects edges, similarity edges, and drugs with limited knowledge. Our model significantly outperforms existing KG-based models. Additionally, we examine the problem of negative relations generation and show that the cache-based approach works best for polypharmacy tasks., (© 2022. The Author(s).)
Published: 2022
Full Text: View/download PDF

39. Predicting protein network topology clusters from chemical structure using deep learning.

Author: Sreenivasan AP, Harrison PJ, Schaal W, Matuszewski DJ, Kultima K, and Spjuth O
Abstract: Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity., (© 2022. The Author(s).)
Published: 2022
Full Text: View/download PDF

40. Migrating to Long-Read Sequencing for Clinical Routine BCR-ABL1 TKI Resistance Mutation Screening.

Author: Schaal W, Ameur A, Olsson-Strömberg U, Hermanson M, Cavelier L, and Spjuth O
Abstract: Objective: The aim of this project was to implement long-read sequencing for BCR-ABL1 TKI resistance mutation screening in a clinical setting for patients undergoing treatment for chronic myeloid leukemia., Materials and Methods: Processes were established for registering and transferring samples from the clinic to an academic sequencing facility for long-read sequencing. An automated analysis pipeline for detecting mutations was established, and an information system was implemented comprising features for data management, analysis and visualization. Clinical validation was performed by identifying BCR-ABL1 TKI resistance mutations by Sanger and long-read sequencing in parallel. The developed software is available as open source via GitHub at https://github.com/pharmbio/clamp., Results: The information system enabled traceable transfer of samples from the clinic to the sequencing facility, robust and automated analysis of the long-read sequence data, and communication of results from sequence analysis in a reporting format that could be easily interpreted and acted upon by clinical experts. In a validation study, all 17 resistance mutations found by Sanger sequencing were also detected by long-read sequencing. An additional 16 mutations were found only by long-read sequencing, all of them with frequencies below the limit of detection for Sanger sequencing. The clonal distributions of co-existing mutations were automatically resolved through the long-read data analysis. After the implementation and validation, the clinical laboratory switched their routine protocol from using Sanger to long-read sequencing for this application., Conclusions: Long-read sequencing delivers results with higher sensitivity compared to Sanger sequencing and enables earlier detection of emerging TKI resistance mutations. The developed processes, analysis workflow, and software components lower barriers for adoption and could be extended to other applications., Competing Interests: Declaration of Conflicting Interests: The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Authors WS, AA, and OS are involved with Pincer Bio AB, a company formed as a result of the work presented herein to further develop and distribute LR-SMS analysis software., (© The Author(s) 2022.)
Published: 2022
Full Text: View/download PDF

41. An Open-Source Modular Framework for Automated Pipetting and Imaging Applications.

Author: Ouyang W, Bowman RW, Wang H, Bumke KE, Collins JT, Spjuth O, Carreras-Puigvert J, and Diederich B
Subjects: Automation methods, Humans, Microscopy, Reproducibility of Results, Algorithms, Software
Abstract: The number of samples in biological experiments is continuously increasing, but complex protocols and human error in many cases lead to suboptimal data quality and hence difficulties in reproducing scientific findings. Laboratory automation can alleviate many of these problems by precisely reproducing machine-readable protocols. These instruments generally require high up-front investments, and due to the lack of open application programming interfaces (APIs), they are notoriously difficult for scientists to customize and control outside of the vendor-supplied software. Here, automated, high-throughput experiments are demonstrated for interdisciplinary research in life science that can be replicated on a modest budget, using open tools to ensure reproducibility by combining the tools OpenFlexure, Opentrons, ImJoy, and UC2. This automated sample preparation and imaging pipeline can easily be replicated and established in many laboratories as well as in educational contexts through easy-to-understand algorithms and easy-to-build microscopes. Additionally, the creation of feedback loops, with later pipetting or imaging steps depending on the analysis of previously acquired images, enables the realization of fully autonomous "smart" microscopy experiments. All documents and source files are publicly available to prove the concept of smart lab automation using inexpensive, open tools. It is believed this democratizes access to the power and repeatability of automated experiments., (© 2021 The Authors. Advanced Biology published by Wiley-VCH GmbH.)
Published: 2022
Full Text: View/download PDF

42. In silico predictions of the human pharmacokinetics/toxicokinetics of 65 chemicals from various classes using conformal prediction methodology.

Author: Fagerholm U, Hellberg S, Alvarsson J, and Spjuth O
Subjects: Biological Availability, Computer Simulation, Humans, Kinetics, Pharmaceutical Preparations, Toxicokinetics, Models, Biological, Pharmacokinetics
Abstract: Pharmacokinetic/toxicokinetic (PK/TK) information for chemicals in humans is generally lacking. Here we applied machine learning, conformal prediction and a new physiologically-based PK/TK model for prediction of the human PK/TK of 65 chemicals from different classes, including carcinogens, food constituents and preservatives, vitamins, sweeteners, dyes and colours, pesticides, alternative medicines, flame retardants, psychoactive drugs, dioxins, poisons, UV-absorbents, surfactants, solvents and cosmetics.About 80% of the main human PK/TK (fraction absorbed, oral bioavailability, half-life, unbound fraction in plasma, clearance, volume of distribution, fraction excreted) for the selected chemicals was missing in the literature. This information was now added (from in silico predictions). Median and mean prediction errors for these parameters were 1.3- to 2.7-fold and 1.4- to 4.8-fold, respectively. In total, 59 and 86% of predictions had errors <2- and <5-fold, respectively. Predicted and observed PK/TK for the chemicals was generally within the range for pharmaceutical drugs.The results validated the new integrated system for prediction of the human PK/TK for different chemicals and added important missing information. No general difference in PK/TK-characteristics was found between the selected chemicals and pharmaceutical drugs.
Published: 2022
Full Text: View/download PDF

43. In silico prediction of volume of distribution of drugs in man using conformal prediction performs on par with animal data-based models.

Author: Fagerholm U, Hellberg S, Alvarsson J, Arvidsson McShane S, and Spjuth O
Subjects: Animals, Drug Discovery, Models, Animal, Pharmacokinetics, Rats, Models, Biological, Pharmaceutical Preparations
Abstract: Volume of distribution at steady state (V ss ) is an important pharmacokinetic endpoint. In this study we apply machine learning and conformal prediction for human V ss prediction, and make a head-to-head comparison with rat-to-man scaling, allometric scaling and the Rodgers-Lukova method on combined in silico and in vitro data, using a test set of 105 compounds with experimentally observed V ss .The mean prediction error and % with <2-fold prediction error for our method were 2.4-fold and 64%, respectively. 69% of test compounds had an observed V ss within the prediction interval at a 70% confidence level. In comparison, 2.2-, 2.9- and 3.1-fold mean errors and 69, 64 and 61% of predictions with <2-fold error was reached with rat-to-man and allometric scaling and Rodgers-Lukova method, respectively.We conclude that our method has theoretically proven validity that was empirically confirmed, and showing predictive accuracy on par with animal models and superior to an alternative widely used in silico -based method. The option for the user to select the level of confidence in predictions offers better guidance on how to optimise V ss in drug discovery applications.
Published: 2021
Full Text: View/download PDF

44. scConnect: a method for exploratory analysis of cell-cell communication based on single-cell RNA-sequencing data.

Author: Jakobsson JET, Spjuth O, and Lagerström MC
Abstract: Motivation: Cell to cell communication is critical for all multicellular organisms, and single-cell sequencing facilitates the construction of full connectivity graphs between cell types in tissues. Such complex data structures demand novel analysis methods and tools for exploratory analysis., Results: We propose a method to predict the putative ligand-receptor interactions between cell types from single-cell RNA-sequencing data. This is achieved by inferring and incorporating interactions in a multi-directional graph, thereby enabling contextual exploratory analysis. We demonstrate that our approach can detect common and specific interactions between cell types in mouse brain and human tumors, and that these interactions fit with expected outcomes. These interactions also include predictions made with molecular ligands integrating information from several types of genes necessary for ligand production and transport. Our implementation is general and can be appended to any transcriptome analysis pipeline to provide unbiased hypothesis generation regarding ligand to receptor interactions between cell populations or for network analysis in silico., Availability and Implementation: scConnect is open source and available as a Python package at https://github.com/JonETJakobsson/scConnect. scConnect is directly compatible with Scanpy scRNA-sequencing pipelines., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2021. Published by Oxford University Press.)
Published: 2021
Full Text: View/download PDF

45. Integrating Statistical and Machine-Learning Approach for Meta-Analysis of Bisphenol A-Exposure Datasets Reveals Effects on Mouse Gene Expression within Pathways of Apoptosis and Cell Survival.

Author: Lukashina N, Williams MJ, Kartysheva E, Virko E, Kudłak B, Fredriksson R, Spjuth O, and Schiöth HB
Subjects: Air Pollutants, Occupational toxicity, Animals, Cell Survival, Datasets as Topic, Gene Expression Profiling, Liver drug effects, Male, Meta-Analysis as Topic, Mice, Apoptosis, Benzhydryl Compounds toxicity, Biomarkers metabolism, Gene Expression Regulation drug effects, Liver metabolism, Machine Learning, Models, Statistical, Phenols toxicity
Abstract: Bisphenols are important environmental pollutants that are extensively studied due to different detrimental effects, while the molecular mechanisms behind these effects are less well understood. Like other environmental pollutants, bisphenols are being tested in various experimental models, creating large expression datasets found in open access storage. The meta-analysis of such datasets is, however, very complicated for various reasons. Here, we developed an integrating statistical and machine-learning model approach for the meta-analysis of bisphenol A (BPA) exposure datasets from different mouse tissues. We constructed three joint datasets following three different strategies for dataset integration: in particular, using all common genes from the datasets, uncorrelated, and not co-expressed genes, respectively. By applying machine learning methods to these datasets, we identified genes whose expression was significantly affected in all of the BPA microanalysis data tested; those involved in the regulation of cell survival include: Tnfr2 , Hgf-Met , Agtr1a , Bdkrb2 ; signaling through Mapk8 ( Jnk1 )); DNA repair ( Hgf-Met , Mgmt ); apoptosis ( Tmbim6 , Bcl2 , Apaf1 ); and cellular junctions ( F11r , Cldnd1 , Ctnd1 and Yes1 ). Our results highlight the benefit of combining existing datasets for the integrated analysis of a specific topic when individual datasets are limited in size.
Published: 2021
Full Text: View/download PDF

46. Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning.

Author: Norinder U, Spjuth O, and Svensson F
Abstract: Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox., (© 2021. The Author(s).)
Published: 2021
Full Text: View/download PDF

47. Comparison between lab variability and in silico prediction errors for the unbound fraction of drugs in human plasma.

Author: Fagerholm U, Spjuth O, and Hellberg S
Subjects: Computer Simulation, Humans, Models, Biological, Protein Binding, Pharmaceutical Preparations, Plasma
Abstract: Variability of the unbound fraction in plasma (f u ) between labs, methods and conditions is known to exist. Variability and uncertainty of this parameter influence predictions of the overall pharmacokinetics of drug candidates and might jeopardise safety in early clinical trials. Objectives of this study were to evaluate the variability of human in vitro f u -estimates between labs for a range of different drugs, and to develop and validate an in silico f u -prediction method and compare the results to the lab variability.A new in silico method with prediction accuracy (Q 2 ) of 0.69 for log f u was developed. The median and maximum prediction errors were 1.9- and 92-fold, respectively. Corresponding estimates for lab variability (ratio between max and min f u for each compound) were 2.0- and 185-fold, respectively. Greater than 10-fold lab variability was found for 14 of 117 selected compounds.Comparisons demonstrate that in silico predictions were about as reliable as lab estimates when these have been generated during different conditions. Results propose that the new validated in silico prediction method is valuable not only for predictions at the drug design stage, but also for reducing uncertainties of f u -estimations and improving safety of drug candidates entering the clinical phase.
Published: 2021
Full Text: View/download PDF

48. The machine learning life cycle and the cloud: implications for drug discovery.

Author: Spjuth O, Frid J, and Hellander A
Subjects: Animals, Cloud Computing, Drug Discovery, Humans, Life Cycle Stages, Artificial Intelligence, Machine Learning
Abstract: Introduction : Artificial intelligence (AI) and machine learning (ML) are increasingly used in many aspects of drug discovery. Larger data sizes and methods such as Deep Neural Networks contribute to challenges in data management, the required software stack, and computational infrastructure. There is an increasing need in drug discovery to continuously re-train models and make them available in production environments. Areas covered : This article describes how cloud computing can aid the ML life cycle in drug discovery. The authors discuss opportunities with containerization and scientific workflows and introduce the concept of MLOps and describe how it can facilitate reproducible and robust ML modeling in drug discovery organizations. They also discuss ML on private, sensitive and regulated data. Expert opinion : Cloud computing offers a compelling suite of building blocks to sustain the ML life cycle integrated in iterative drug discovery. Containerization and platforms such as Kubernetes together with scientific workflows can enable reproducible and resilient analysis pipelines, and the elasticity and flexibility of cloud infrastructures enables scalable and efficient access to compute resources. Drug discovery commonly involves working with sensitive or private data, and cloud computing and federated learning can contribute toward enabling collaborative drug discovery within and between organizations. Abbreviations : AI = Artificial Intelligence; DL = Deep Learning; GPU = Graphics Processing Unit; IaaS = Infrastructure as a Service; K8S = Kubernetes; ML = Machine Learning; MLOps = Machine Learning and Operations; PaaS = Platform as a Service; QC = Quality Control; SaaS = Software as a Service.
Published: 2021
Full Text: View/download PDF

49. A phenomics approach for antiviral drug discovery.

Author: Rietdijk J, Tampere M, Pettke A, Georgiev P, Lapins M, Warpman-Berglund U, Spjuth O, Puumalainen MR, and Carreras-Puigvert J
Subjects: Cell Line, Dose-Response Relationship, Drug, Drug Evaluation, Preclinical methods, Humans, SARS-CoV-2 physiology, Antiviral Agents pharmacology, Drug Discovery methods, Phenomics methods, SARS-CoV-2 drug effects
Abstract: Background: The emergence and continued global spread of the current COVID-19 pandemic has highlighted the need for methods to identify novel or repurposed therapeutic drugs in a fast and effective way. Despite the availability of methods for the discovery of antiviral drugs, the majority tend to focus on the effects of such drugs on a given virus, its constituent proteins, or enzymatic activity, often neglecting the consequences on host cells. This may lead to partial assessment of the efficacy of the tested anti-viral compounds, as potential toxicity impacting the overall physiology of host cells may mask the effects of both viral infection and drug candidates. Here we present a method able to assess the general health of host cells based on morphological profiling, for untargeted phenotypic drug screening against viral infections., Results: We combine Cell Painting with antibody-based detection of viral infection in a single assay. We designed an image analysis pipeline for segmentation and classification of virus-infected and non-infected cells, followed by extraction of morphological properties. We show that this methodology can successfully capture virus-induced phenotypic signatures of MRC-5 human lung fibroblasts infected with human coronavirus 229E (CoV-229E). Moreover, we demonstrate that our method can be used in phenotypic drug screening using a panel of nine host- and virus-targeting antivirals. Treatment with effective antiviral compounds reversed the morphological profile of the host cells towards a non-infected state., Conclusions: The phenomics approach presented here, which makes use of a modified Cell Painting protocol by incorporating an anti-virus antibody stain, can be used for the unbiased morphological profiling of virus infection on host cells. The method can identify antiviral reference compounds, as well as novel antivirals, demonstrating its suitability to be implemented as a strategy for antiviral drug repurposing and drug discovery., (© 2021. The Author(s).)
Published: 2021
Full Text: View/download PDF

50. Machine Learning Strategies When Transitioning between Biological Assays.

Author: Arvidsson McShane S, Ahlberg E, Noeske T, and Spjuth O
Subjects: Molecular Conformation, Retrospective Studies, Biological Assay, Machine Learning
Abstract: Machine learning is widely used in drug development to predict activity in biological assays based on chemical structure. However, the process of transitioning from one experimental setup to another for the same biological endpoint has not been extensively studied. In a retrospective study, we here explore different modeling strategies of how to combine data from the old and new assays when training conformal prediction models using data from hERG and Na V assays. We suggest to continuously monitor the validity and efficiency of models as more data is accumulated from the new assay and select a modeling strategy based on these metrics. In order to maximize the utility of data from the old assay, we propose a strategy that augments the proper training set of an inductive conformal predictor by adding data from the old assay but only having data from the new assay in the calibration set, which results in valid (well-calibrated) models with improved efficiency compared to other strategies. We study the results for varying sizes of new and old assays, allowing for discussion of different practical scenarios. We also conclude that our proposed assay transition strategy is more beneficial, and the value of data from the new assay is higher, for the harder case of regression compared to classification problems.
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

133 results on '"Spjuth O"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources