1. DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science
- Author
-
Mufei Li, Wenxuan Fan, Yangkang Zhang, Jiajing Hu, Yaxin Gu, George Karypis, and Jinjing Zhou
- Subjects
FOS: Computer and information sciences ,Class (computer programming) ,Computer Science - Machine Learning ,Theoretical computer science ,Speedup ,business.industry ,General Chemical Engineering ,Deep learning ,General Chemistry ,Python (programming language) ,Pipeline (software) ,Quantitative Biology - Quantitative Methods ,Article ,Machine Learning (cs.LG) ,Chemistry ,FOS: Biological sciences ,Code (cryptography) ,Artificial intelligence ,Data pre-processing ,business ,QD1-999 ,Implementation ,computer ,Quantitative Methods (q-bio.QM) ,computer.programming_language - Abstract
Graph neural networks (GNNs) constitute a class of deep learning methods for graph data. They have wide applications in chemistry and biology, such as molecular property prediction, reaction prediction, and drug-target interaction prediction. Despite the interest, GNN-based modeling is challenging as it requires graph data preprocessing and modeling in addition to programming and deep learning. Here, we present Deep Graph Library (DGL)-LifeSci, an open-source package for deep learning on graphs in life science. Deep Graph Library (DGL)-LifeSci is a python toolkit based on RDKit, PyTorch, and Deep Graph Library (DGL). DGL-LifeSci allows GNN-based modeling on custom datasets for molecular property prediction, reaction prediction, and molecule generation. With its command-line interfaces, users can perform modeling without any background in programming and deep learning. We test the command-line interfaces using standard benchmarks MoleculeNet, USPTO, and ZINC. Compared with previous implementations, DGL-LifeSci achieves a speed up by up to 6×. For modeling flexibility, DGL-LifeSci provides well-optimized modules for various stages of the modeling pipeline. In addition, DGL-LifeSci provides pretrained models for reproducing the test experiment results and applying models without training. The code is distributed under an Apache-2.0 License and is freely accessible at https://github.com/awslabs/dgl-lifesci.
- Published
- 2021