1. Efficient implementation of atom-density representations
- Author
-
Guillaume Fraux, Markus Stricker, Michael J. Willatt, Alexander Goscinski, Max Veit, Félix Musil, Michele Ceriotti, and Till Junge
- Subjects
Theoretical computer science ,Discretization ,Computer science ,Feature vector ,General Physics and Astronomy ,FOS: Physical sciences ,Context (language use) ,Molecular dynamics ,010402 general chemistry ,01 natural sciences ,Interatomic potentials ,Position (vector) ,Physics - Chemical Physics ,0103 physical sciences ,Machine learning ,Computational methods ,Physical and Theoretical Chemistry ,Representation (mathematics) ,Chemical Physics (physics.chem-ph) ,Condensed Matter - Materials Science ,500 Naturwissenschaften und Mathematik::530 Physik::539 Moderne Physik ,010304 chemical physics ,Basis (linear algebra) ,Materials Science (cond-mat.mtrl-sci) ,0104 chemical sciences ,Range (mathematics) ,Equivariant map - Abstract
Physically-motivated and mathematically robust atom-centred representations of molecular structures are key to the success of modern atomistic machine learning (ML) methods. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules as well as to explore and visualize the chemical compound and configuration space. Recently, it has become clear that many of the most effective representations share a fundamental formal connection: that they can all be expressed as a discretization of N-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing the calculation of such representations. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss SOAP features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis set. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to further reduce the total computational cost by at up to a factor of 4 or 5 without affecting the model's symmetry properties and without significantly impacting its accuracy.
- Published
- 2022