Back to Search
Start Over
Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database.
- Source :
-
Ecotoxicology and environmental safety [Ecotoxicol Environ Saf] 2023 Apr 15; Vol. 255, pp. 114806. Date of Electronic Publication: 2023 Mar 20. - Publication Year :
- 2023
-
Abstract
- Cancer, the second largest human disease, has become a major public health problem. The prediction of chemicals' carcinogenicity before their synthesis is crucial. In this paper, seven machine learning algorithms (i.e., Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), Complement Naive Bayes (CNB), K-Nearest Neighbor (KNN), XGBoost, and Multilayer Perceptron (MLP)) were used to construct the carcinogenicity triple classification prediction (TCP) model (i.e., 1A, 1B, Category 2). A total of 1444 descriptors of 118 hazardous organic chemicals were calculated by Discovery Studio 2020, Sybyl X-2.0 and PaDEL-Descriptor software. The constructed carcinogenicity TCP model was evaluated through five model evaluation indicators (i.e., Accuracy, Precision, Recall, F1 Score and AUC). The model evaluation results show that Accuracy, Precision, Recall, F1 Score and AUC evaluation indicators meet requirements (greater than 0.6). The accuracy of RF, LR, XGBoost, and MLP models for predicting carcinogenicity of Category 2 is 91.67%, 79.17%, 100%, and 100%, respectively. In addition, the constructed machine learning model in this study has potential for error correction. Taking XGBoost model as an example, the predicted carcinogenicity level of 1,2,3-Trichloropropane (96-18-4) is Category 2, but the actual carcinogenicity level is 1B. But the difference between Category 2 and 1B is only 0.004, indicating that the XGBoost is one optimum model of the seven constructed machine learning models. Besides, results showed that functional groups like chlorine and benzene ring might influence the prediction of carcinogenic classification. Therefore, considering functional group characteristics of chemicals before constructing the carcinogenicity prediction model of organic chemicals is recommended. The predicted carcinogenicity of the organic chemicals using the optimum machine leaning model (i.e., XGBoost) was also evaluated and verified by the toxicokinetics. The RF and XGBoost TCP models constructed in this paper can be used for carcinogenicity detection before synthesizing new organic substances. It also provides technical support for the subsequent management of organic chemicals.<br />Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.<br /> (Copyright © 2023 The Authors. Published by Elsevier Inc. All rights reserved.)
- Subjects :
- Bayes Theorem
Carcinogenesis
Support Vector Machine
World Health Organization
Algorithms
United States
European Union
China
Databases, Factual
Carcinogens toxicity
Carcinogens chemistry
Hazardous Substances chemistry
Hazardous Substances toxicity
Machine Learning
Organic Chemicals toxicity
Organic Chemicals chemistry
Subjects
Details
- Language :
- English
- ISSN :
- 1090-2414
- Volume :
- 255
- Database :
- MEDLINE
- Journal :
- Ecotoxicology and environmental safety
- Publication Type :
- Academic Journal
- Accession number :
- 36948010
- Full Text :
- https://doi.org/10.1016/j.ecoenv.2023.114806