P. Arnod-Prin, André Bittar, M.-H. Metzger, Stéfan Jacques Darmoni, L. Dini, C. Bouvry, Ivan Kergourlay, Frédérique Segond, Nastassia Tvardik, Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Service d'informatique biomédicale [Rouen], CHU Rouen, Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU), Holmes Semantic Solutions, Viseo Technologies, Laboratoire d'Informatique Médicale et Ingénierie des Connaissances en e-Santé (LIMICS), Université Paris 13 (UP13)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU), Equipe Traitement de l'information en Biologie Santé (TIBS - LITIS), Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Université Le Havre Normandie (ULH), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA), Unité d'hygiène hospitalière et d'épidémiologie, Hospices Civils de Lyon, 5 place d'Arsonval, 69437 Lyon cedex 03, France, Hospices Civiles de Lyon, ANR-12-TECS-0006,SYNODOS,SYstème de Normalisation et d'Organisation de Données médicales textuelles pour l'Observation en Santé(2012), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), and Normandie Université (NU)-Université Le Havre Normandie (ULH)
Introduction The electronic health record (EHR) is a very important potential source of data for various areas, such as medical decision support tools, evidence-based medicine or epidemiological surveillance. Much of this data is available in text format. Methods of natural language processing can be used to perform data mining and facilitate interpretation. The purpose of this project was to develop a generic semantic solution for extracting and structuring medical data for epidemiological analyses or for medical decision-support. The solution was developed with the objective of making it as independent as possible from the field of medical application in order to allow any new user to write his or her own expert rules regardless of their area of medical expertise. Material and methods SYNODOS offers a modular architecture that makes a clear distinction between the linguistic rules and the medical expert rules. Different modules have been developed or adapted for this purpose: an interface between the multi-terminology server and semantic analyzer during the extraction phase, linguistic rules to extract temporal expressions, expert rules adapted to two areas of application (nosocomial infections, cancer), an interface between the engine and the linguistic knowledge base. Results Modular integrations were performed consecutively. The multi-terminology extractor and semantic analyzer were first interfaced during the extraction phase. Output of this data processing was then integrated into a knowledge base. A user interface to access documents and write business rules was developed. Expert rules for the detection of nosocomial infections and for the evaluation of colon cancer management have been developed. It was necessary to develop an additional module the need for which had not been identified during the drafting of the protocol. This module aims to structure the output of the data processing described above, according to the patient's care pathway. This module is based on the writing of medical expert rules. Evaluation indicators were obtained at different stages of the process (terminology extraction, semantic relations, data structuring, detection of events of interest). Discussion This project helped to highlight the value of combining different technologies (natural language processing, terminology, expert systems integration) to allow for the use of unstructured data in epidemiology. However, the need to develop an additional module of expert rules did not allow a complete and operational solution. Furthermore the multi-terminology extractor (ECMT V2) response time is too long (6 min per report). A change in technology was envisaged at the end of the project to reduce this time. Conclusions The originality of the SYNODOS project is the development of a single solution that integrates different technologies needed for the production of epidemiological indicators in the context of hospital activity. The project results confirm the interest but certain technological obstacles concerning the processing time need to be resolved in order to render the solution operational in a hospital environment.