1. A graph neural network approach for molecule carcinogenicity prediction
- Author
-
Philip Fradkin, Adamo Young, Lazar Atanackovic, Brendan Frey, Leo J Lee, and Bo Wang
- Subjects
Statistics and Probability ,Computational Mathematics ,Computational Theory and Mathematics ,Carcinogens ,Animals ,Neural Networks, Computer ,Molecular Biology ,Biochemistry ,Computer Science Applications ,Forecasting ,Mutagens - Abstract
Motivation Molecular carcinogenicity is a preventable cause of cancer, but systematically identifying carcinogenic compounds, which involves performing experiments on animal models, is expensive, time consuming and low throughput. As a result, carcinogenicity information is limited and building data-driven models with good prediction accuracy remains a major challenge. Results In this work, we propose CONCERTO, a deep learning model that uses a graph transformer in conjunction with a molecular fingerprint representation for carcinogenicity prediction from molecular structure. Special efforts have been made to overcome the data size constraint, such as multi-round pre-training on related but lower quality mutagenicity data, and transfer learning from a large self-supervised model. Extensive experiments demonstrate that our model performs well and can generalize to external validation sets. CONCERTO could be useful for guiding future carcinogenicity experiments and provide insight into the molecular basis of carcinogenicity. Availability and implementation The code and data underlying this article are available on github at https://github.com/bowang-lab/CONCERTO
- Published
- 2022