Back to Search
Start Over
Cautionary Guidelines for Machine Learning Studies with Combinatorial Datasets
- Source :
- ACS Combinatorial Science. 22:586-591
- Publication Year :
- 2020
- Publisher :
- American Chemical Society (ACS), 2020.
-
Abstract
- Regression modeling is becoming increasingly prevalent in organic chemistry as a tool for reaction outcome prediction and mechanistic interrogation. Frequently, to acquire the requisite amount of data for such studies, researchers employ combinatorial datasets to maximize the number of data points while limiting the number of discrete chemical entities required. An often-overlooked problem in modeling studies using combinatorial datasets is the tendency to fit on patterns in the datasets (i.e., the presence or absence of a reactant or catalyst) rather than to identify meaningful trends between descriptors and the response variable. Consequently, the generality and interpretability of such models suffer. This report illustrates these well-known pitfalls in a case study, demonstrates the necessary control experiments to identify when this property will be problematic, and suggests how to perform further validation to assess general applicability and interpretability of models trained using combinatorial datasets.
- Subjects :
- Databases, Factual
Property (programming)
Quantitative Structure-Activity Relationship
010402 general chemistry
Machine learning
computer.software_genre
01 natural sciences
Catalysis
Machine Learning
Combinatorial Chemistry Techniques
Sulfhydryl Compounds
Interpretability
Generality
010405 organic chemistry
business.industry
Chemistry
Stereoisomerism
Regression analysis
General Chemistry
General Medicine
Limiting
0104 chemical sciences
Variable (computer science)
Data point
Models, Chemical
Imines
Artificial intelligence
Outcome prediction
business
computer
Subjects
Details
- ISSN :
- 21568944 and 21568952
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- ACS Combinatorial Science
- Accession number :
- edsair.doi.dedup.....8182fbb25027132a83930a51d5f78b7f
- Full Text :
- https://doi.org/10.1021/acscombsci.0c00118