1. Reducing crystal structure overprediction : from small rigid molecules to conformationally complex drugs
- Author
-
Francia, Nicholas Francesco
- Abstract
In the pharmaceutical industry, the control of a new drug's crystal form is key to optimise its formulation and mode of action. Computational Crystal Structure Prediction (CSP) methods for organic crystalline materials are becoming increasingly accurate at predicting the relative stability between packings, even if they usually grossly overestimate the number of polymorphs. The purpose of this work is to develop a systematic and scalable method to reduce CSP sets to a small number of putative polymorphs by including temperature effects. In fact, not all hypothetical structures corresponding to local minima in the lattice energy landscape are expected to be stable at finite temperature with many of these that merge into a smaller set of persistent states. In order to identify persistent structures, classical molecular dynamics simulations at finite temperature are performed on CSP-generated crystal structures. Unstable structures are thus automatically removed by checking if molecules exhibit a random inter-molecular orientation, typical of the melted state. On the other hand, to identify those structures that convert to the same geometry, I devised a clustering analysis based on probabilistic fingerprints that provide information on the relative position, relative orientation and conformation of molecules within a dynamic crystal supercell. These molecule-specific fingerprints are able to efficiently distinguish different structures of large supercells and can handle robustly the displacement of atomic positions from equilibrium typical of finite-temperature simulations. These are used to quantitatively assess the similarity between pairs of structures and cluster analogous geometries. Finally, I used Well-Tempered Metadynamics on the cluster centres to overcome MD limits and sample possible slow transitions. I applied this method on molecules of increasingly conformational complexity and datasets spanning from a few dozens to thousands of structures. Instrumental in achieving scalability over a large set of crystal structures has been the development of a Python library that handles the setup of MD simulations and automatically analyses the resulting trajectories, enabling us to manage the large sets of structures typical of real-world CSP applications.
- Published
- 2022