1. Machine learning-based predictive modeling to identify genotypic traits associated with Salmonella enterica disease endpoints in isolates from ground chicken
- Author
-
Abani K. Pradhan, Patrick Murigu Kamau Njage, Shraddha Karanth, Jianghong Meng, and Collins K. Tanui
- Subjects
Whole genome sequencing ,Salmonella ,business.industry ,Disease ,Biology ,medicine.disease_cause ,biology.organism_classification ,Machine learning ,computer.software_genre ,Genome ,DNA sequencing ,Predictive modeling ,Random forest ,Salmonella enterica ,Genotype ,medicine ,Artificial intelligence ,business ,computer ,Food Science - Abstract
As the cost of genome sequencing of foodborne pathogens decreases, it has become possible to sequence a large number of isolates and evaluate those using machine learning algorithms. This study aimed to utilize machine learning algorithms to predict the disease endpoints in untagged Salmonella genome sequences isolated from ground chicken. Our models recognized genetic patterns in the test dataset based on our training dataset obtained from an extensive literature review, using a semi-supervised approach. Using known genotypes as input features, the semi-supervised random forest model showed the highest overall accuracy of 0.94 (95% confidence interval: 0.85–0.99), and a Kappa value of 0.82, and predicted 87% of the disease endpoints. The model predicted genes associated with specific disease endpoints that were associated with virulence, which could be used as features in predictive modeling endeavors in the future. Our machine learning approach would be useful in different areas of food safety, including identifying pathogen sources, predicting antibiotic resistance, and risk assessment of foodborne pathogens.
- Published
- 2022
- Full Text
- View/download PDF