Back to Search
Start Over
Classification of estrogenic compounds by coupling high content analysis and machine learning algorithms
- Source :
- PLoS Computational Biology, PLoS Computational Biology, Vol 16, Iss 9, p e1008191 (2020)
- Publication Year :
- 2020
-
Abstract
- Environmental toxicants affect human health in various ways. Of the thousands of chemicals present in the environment, those with adverse effects on the endocrine system are referred to as endocrine-disrupting chemicals (EDCs). Here, we focused on a subclass of EDCs that impacts the estrogen receptor (ER), a pivotal transcriptional regulator in health and disease. Estrogenic activity of compounds can be measured by many in vitro or cell-based high throughput assays that record various endpoints from large pools of cells, and increasingly at the single-cell level. To simultaneously capture multiple mechanistic ER endpoints in individual cells that are affected by EDCs, we previously developed a sensitive high throughput/high content imaging assay that is based upon a stable cell line harboring a visible multicopy ER responsive transcription unit and expressing a green fluorescent protein (GFP) fusion of ER. High content analysis generates voluminous multiplex data comprised of minable features that describe numerous mechanistic endpoints. In this study, we present a machine learning pipeline for rapid, accurate, and sensitive assessment of the endocrine-disrupting potential of benchmark chemicals based on data generated from high content analysis. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, both linear logistic regression and nonlinear Random Forest classifiers were benchmarked and evaluated for predicting the estrogenic activity of unknown compounds. Furthermore, through feature selection, data visualization, and model discrimination, the most informative features were identified for the classification of ER agonists/antagonists. The results of this data-driven study showed that highly accurate and generalized classification models with a minimum number of features can be constructed without loss of generality, where these machine learning models serve as a means for rapid mechanistic/phenotypic evaluation of the estrogenic potential of many chemicals.<br />Author summary Chemical contaminants or toxicants pose environmental and health-related risks for exposure. The ability to rapidly understand their biological impact, specifically on a key modulator of important physiological and pathological states in the human body is essential for diagnosing and avoiding undesirable health outcomes during environmental emergencies. In this study, we use advanced data analytics for creating statistical models that can accurately predict the endocrinological activity of toxic chemicals based on high throughput/high content image analysis data. We focus on a subclass of chemicals that affect the estrogen receptor (ER), which is a pivotal transcriptional regulator in health and disease. The multidimensional imaging data of these benchmark chemicals are used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, we evaluate linear and nonlinear classifiers for predicting the estrogenic activity of unknown compounds and use feature selection, data visualization, and model discrimination methodologies to identify the most informative features for the classification of ER agonists/antagonists.
- Subjects :
- 0301 basic medicine
Gene Expression
Estrogenic Compounds
computer.software_genre
Biochemistry
Machine Learning
0302 clinical medicine
Mathematical and Statistical Techniques
Cluster Analysis
Multiplex
Biology (General)
Data Management
Principal Component Analysis
Ecology
Chromosome Biology
Statistics
Chromatin
Random forest
Computational Theory and Mathematics
Receptors, Estrogen
Modeling and Simulation
High-content screening
Principal component analysis
Physical Sciences
Epigenetics
Algorithms
Research Article
Statistical Distributions
Computer and Information Sciences
QH301-705.5
Imaging Techniques
Feature selection
Image Analysis
Biology
Machine learning
Research and Analysis Methods
Imaging data
Cell Line
03 medical and health sciences
Cellular and Molecular Neuroscience
Artificial Intelligence
Genetics
Humans
Statistical Methods
Hierarchical Clustering
Molecular Biology
Ecology, Evolution, Behavior and Systematics
business.industry
Data Visualization
Biology and Life Sciences
Estrogens
Cell Biology
Probability Theory
Hormones
Hierarchical clustering
030104 developmental biology
Multivariate Analysis
Artificial intelligence
business
computer
030217 neurology & neurosurgery
Mathematics
Subjects
Details
- ISSN :
- 15537358
- Volume :
- 16
- Issue :
- 9
- Database :
- OpenAIRE
- Journal :
- PLoS computational biology
- Accession number :
- edsair.doi.dedup.....735209d46f25752b2966ab0ff847adc8