1. Predicting Physiological Effects of Chemical Substances Using Natural Language Processing
- Author
-
Prashan Malla, Tim Oates, Hieu Trong Nguyen, Anh Thi Hoang Pham, JJ Ben-Joseph, Vasudevan Janarthanan, Marcelo Campos, and Sourav Mukherjee
- Subjects
Structure (mathematical logic) ,Computer science ,business.industry ,String (computer science) ,Feature extraction ,Artificial intelligence ,computer.software_genre ,business ,computer ,Natural language processing - Abstract
In this paper, we apply natural language processing methods to develop models for predicting physiological effects of chemical substances based on their molecular structures. Using string representations of structure as a starting point, we vectorize molecules using two different approaches resulting in sparse and dense vector representations, respectively. We use these representations to train predictive models for a variety of physiological effects such as toxicity, cell cycle arrest and proliferation. Using standard chemical datasets, we empirically demonstrate that such models can achieve high predictive accu-racy.
- Published
- 2021
- Full Text
- View/download PDF