Back to Search Start Over

Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise

Authors :
Joshua Feldman
Andrea Thomas-Bachli
Zaki Hasnain Patel
Kamran Khan
Jack Forsyth
Source :
J Am Med Inform Assoc
Publication Year :
2019
Publisher :
Oxford University Press (OUP), 2019.

Abstract

Objective We assessed whether machine learning can be utilized to allow efficient extraction of infectious disease activity information from online media reports. Materials and Methods We curated a data set of labeled media reports (n = 8322) indicating which articles contain updates about disease activity. We trained a classifier on this data set. To validate our system, we used a held out test set and compared our articles to the World Health Organization Disease Outbreak News reports. Results Our classifier achieved a recall and precision of 88.8% and 86.1%, respectively. The overall surveillance system detected 94% of the outbreaks identified by the WHO covered by online media (89%) and did so 43.4 (IQR: 9.5–61) days earlier on average. Discussion We constructed a global real-time disease activity database surveilling 114 illnesses and syndromes. We must further assess our system for bias, representativeness, granularity, and accuracy. Conclusion Machine learning, natural language processing, and human expertise can be used to efficiently identify disease activity from digital media reports.

Details

ISSN :
1527974X and 10675027
Volume :
26
Database :
OpenAIRE
Journal :
Journal of the American Medical Informatics Association
Accession number :
edsair.doi.dedup.....a9ba8b56fa73a7c8df8791379851a049
Full Text :
https://doi.org/10.1093/jamia/ocz112