Back to Search Start Over

Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset

Authors :
Aniket Chitre
Robert C. M. Querimit
Simon D. Rihm
Dogancan Karan
Benchuan Zhu
Ke Wang
Long Wang
Kedar Hippalgaonkar
Alexei A. Lapkin
Source :
Scientific Data, Vol 11, Iss 1, Pp 1-10 (2024)
Publication Year :
2024
Publisher :
Nature Portfolio, 2024.

Abstract

Abstract Liquid formulations are ubiquitous yet have lengthy product development cycles owing to the complex physical interactions between ingredients making it difficult to tune formulations to customer-defined property targets. Interpolative ML models can accelerate liquid formulations design but are typically trained on limited sets of ingredients and without any structural information, which limits their out-of-training predictive capacity. To address this challenge, we selected eighteen formulation ingredients covering a diverse chemical space to prepare an open experimental dataset for training ML models for rinse-off formulations development. The resulting design space has an over 50-fold increase in dimensionality compared to our previous work. Here, we present a dataset of 812 formulations, including 294 stable samples, which cover the entire design space, with phase stability, turbidity, and high-fidelity rheology measurements generated on our semi-automated, ML-driven liquid formulations workflow. Our dataset has the unique attribute of sample-specific uncertainty measurements to train predictive surrogate models.

Subjects

Subjects :
Science

Details

Language :
English
ISSN :
20524463
Volume :
11
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Scientific Data
Publication Type :
Academic Journal
Accession number :
edsdoj.8fa8db7a70894f60b16d6db492fee624
Document Type :
article
Full Text :
https://doi.org/10.1038/s41597-024-03573-w