Back to Search Start Over

Towards Foundation Models for Materials Science: The Open MatSci ML Toolkit

Authors :
Lee, Kin Long Kelvin
Gonzales, Carmelo
Spellings, Matthew
Galkin, Mikhail
Miret, Santiago
Kumar, Nalini
Publication Year :
2023

Abstract

Artificial intelligence and machine learning have shown great promise in their ability to accelerate novel materials discovery. As researchers and domain scientists seek to unify and consolidate chemical knowledge, the case for models with potential to generalize across different tasks within materials science - so-called "foundation models" - grows with ambitions. This manuscript reviews our recent progress with development of Open MatSci ML Toolkit, and details experiments that lay the groundwork for foundation model research and development with our framework. First, we describe and characterize a new pretraining task that uses synthetic data generated from symmetry operations, and reveal complex training dynamics at large scales. Using the pretrained model, we discuss a number of use cases relevant to foundation model development: semantic architecture of datasets, and fine-tuning for property prediction and classification. Our key results show that for simple applications, pretraining appears to provide worse modeling performance than training models from random initialization. However, for more complex instances, such as when a model is required to learn across multiple datasets and types of targets simultaneously, the inductive bias from pretraining provides significantly better performance. This insight will hopefully inform subsequent efforts into creating foundation models for materials science applications.<br />Comment: 17 pages, 7 figures, 1 table. Accepted paper/presentation at the AI4Science workshop at Super Computing '23

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2310.07864
Document Type :
Working Paper
Full Text :
https://doi.org/10.1145/3624062.3626081