1. Automated Workflows for Data Curation and Machine Learning to Develop Quantitative Structure-Activity Relationships.
- Author
-
Gadaleta D
- Subjects
- Software, Cheminformatics methods, Computational Biology methods, Quantitative Structure-Activity Relationship, Machine Learning, Workflow, Data Curation methods
- Abstract
The recent advancements in machine learning and the new availability of large chemical datasets made the development of tools and protocols for computational chemistry a topic of high interest. In this chapter a standard procedure to develop Quantitative Structure-Activity Relationship (QSAR) models was presented and implemented in two freely available and easy-to-use workflows. The first workflow helps the user retrieving chemical data (SMILES) from the web, checking their correctness and curating them to produce consistent and ready-to-use datasets for cheminformatic. The second workflow implements six machine learning methods to develop classification QSAR models. Models can be additionally used to predict external chemicals. Calculation and selection of chemical descriptors, tuning of models' hyperparameters, and methods to handle data unbalancing are also incorporated in the workflow. Both the workflows are implemented in KNIME and represent a useful tool for computational scientists, as well as an intuitive and straightforward introduction to QSAR., (© 2025. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.)
- Published
- 2025
- Full Text
- View/download PDF