Back to Search
Start Over
Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise
- Source :
- J Am Med Inform Assoc
- Publication Year :
- 2019
- Publisher :
- Oxford University Press (OUP), 2019.
-
Abstract
- Objective We assessed whether machine learning can be utilized to allow efficient extraction of infectious disease activity information from online media reports. Materials and Methods We curated a data set of labeled media reports (n = 8322) indicating which articles contain updates about disease activity. We trained a classifier on this data set. To validate our system, we used a held out test set and compared our articles to the World Health Organization Disease Outbreak News reports. Results Our classifier achieved a recall and precision of 88.8% and 86.1%, respectively. The overall surveillance system detected 94% of the outbreaks identified by the WHO covered by online media (89%) and did so 43.4 (IQR: 9.5–61) days earlier on average. Discussion We constructed a global real-time disease activity database surveilling 114 illnesses and syndromes. We must further assess our system for bias, representativeness, granularity, and accuracy. Conclusion Machine learning, natural language processing, and human expertise can be used to efficiently identify disease activity from digital media reports.
- Subjects :
- Databases, Factual
Computer science
Information Storage and Retrieval
Health Informatics
Global Health
Machine learning
computer.software_genre
Communicable Diseases
Health informatics
Disease Outbreaks
Digital media
Machine Learning
User-Computer Interface
Public health surveillance
Humans
Natural Language Processing
Database
business.industry
Infectious disease (medical specialty)
Population Surveillance
Test set
The Internet
Artificial intelligence
Brief Communications
business
Precision and recall
computer
Classifier (UML)
Natural language processing
Subjects
Details
- ISSN :
- 1527974X and 10675027
- Volume :
- 26
- Database :
- OpenAIRE
- Journal :
- Journal of the American Medical Informatics Association
- Accession number :
- edsair.doi.dedup.....a9ba8b56fa73a7c8df8791379851a049
- Full Text :
- https://doi.org/10.1093/jamia/ocz112