Back to Search Start Over

Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web

Authors :
Mathieu Roche
Elena Arsevska
Pascal Hendrikx
Sylvain Falala
David Chavernac
Renaud Lancelot
Barbara Dufour
Contrôle des maladies animales exotiques et émergentes (UMR CMAEE)
Institut National de la Recherche Agronomique (INRA)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)
ADVanced Analytics for data SciencE (ADVANSE)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)-Centre National de la Recherche Scientifique (CNRS)
Direction des Laboratoires (UCAS)
Agence nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES)
École nationale vétérinaire d'Alfort (ENVA)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Centre National de la Recherche Scientifique (CNRS)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)
Source :
Computers and Electronics in Agriculture, Computers and Electronics in Agriculture, Elsevier, 2016, 123, pp.104-115. ⟨10.1016/j.compag.2016.02.010⟩
Publication Year :
2016
Publisher :
HAL CCSD, 2016.

Abstract

Integrated approach to identify terms for monitoring disease emergence on the web.Terms are extracted automatically from disease outbreak web pages.Domain experts identify the terms relevant to characterise a disease emergence.Relevant terms are used as queries to mine the web. Timeliness and precision for detection of infectious animal disease outbreaks from the information published on the web is crucial for prevention against their spread. The work in this paper is part of the methodology for monitoring the web that we currently develop for the French epidemic intelligence team in animal health. We focus on the new and exotic infectious animal diseases that occur worldwide and that are of potential threat to the animal health in France.In order to detect relevant information on the web, we present an innovative approach that retrieves documents using queries based on terms automatically extracted from a corpus of relevant documents and validated with a consensus of domain experts (Delphi method). As a decision support tool to domain experts we introduce a new measure for ranking of extracted terms in order to highlight the more relevant terms. To categorise documents retrieved from the web we use Naive Bayes (NB) and Support Vector Machine (SVM) classifiers.We evaluated our approach on documents on African swine fever (ASF) outbreaks for the period from 2011 to 2014, retrieved from the Google search engine and the PubMed database. From 2400 terms extracted from two corpora of relevant ASF documents, 135 terms were relevant to characterise ASF emergence. The domain experts identified as highly specific to characterise ASF emergence the terms which describe mortality, fever and haemorrhagic clinical signs in Suidae.The new ranking measure correctly ranked the ASF relevant terms until position 161 and fairly until position 227, with areas under ROC curves (AUCs) of 0.802 and 0.709 respectively.Both classifiers were accurate to classify a set of 545 ASF documents (NB of 0.747 and SVM of 0.725) into appropriate categories of relevant (disease outbreak) and irrelevant (economic and general) documents.Our results show that relevant documents can serve as a source of terms to detect infectious animal disease emergence on the web.Our method is generic and can be used both in animal and public health domain.

Details

Language :
English
ISSN :
01681699
Database :
OpenAIRE
Journal :
Computers and Electronics in Agriculture, Computers and Electronics in Agriculture, Elsevier, 2016, 123, pp.104-115. ⟨10.1016/j.compag.2016.02.010⟩
Accession number :
edsair.doi.dedup.....470293c31d274578f3bfd7fbb4f78243
Full Text :
https://doi.org/10.1016/j.compag.2016.02.010⟩