1. $\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials
- Author
-
Khrabrov, Kuzma, Ber, Anton, Tsypin, Artem, Ushenin, Konstantin, Rumiantsev, Egor, Telepov, Alexander, Protasov, Dmitry, Shenbin, Ilya, Alekseev, Anton, Shirokikh, Mikhail, Nikolenko, Sergey, Tutubalina, Elena, and Kadurin, Artur
- Subjects
Physics - Chemical Physics ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called $\nabla^2$DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level ($\omega$B97X-D/def2-SVP) for each conformation. Moreover, $\nabla^2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.
- Published
- 2024