65 results on '"Tannier X"'
Search Results
2. Identification multimodale d'une cohorte de patients porteurs de cancers rares de la tête et du cou au sein de l'Entrepôt de données de santé (EDS) de l'AP-HP
- Author
-
Verdoux, M., La Rosa, A., Lolli, I., Tannier, X., Baujat, B., and Kempf, E.
- Published
- 2024
- Full Text
- View/download PDF
3. AB1767-HPR DOCUMENT SEARCH IN LARGE RHEUMATOLOGY DATABASES: ADVANCED KEYWORD QUERIES TO SELECT HOMOGENEOUS PHENOTYPES
- Author
-
Gérardin, C., primary, Xong, Y., additional, Mekinian, A., additional, Carrt, F., additional, and Tannier, X., additional
- Published
- 2023
- Full Text
- View/download PDF
4. Panorama des entrepôts de données hospitaliers dans les CHU/CHR de France
- Author
-
Doutreligne, M., primary, Degremont, A., additional, Jachiet, P-A., additional, Lamer, A., additional, and Tannier, X., additional
- Published
- 2023
- Full Text
- View/download PDF
5. Identification automatique des patients avec fractures ostéoporotiques à partir de comptes rendus médicaux
- Author
-
Bellamine, A., primary, Daniel, C., additional, Wajsburt, P., additional, Roux, C., additional, Tannier, X., additional, and Briot, K., additional
- Published
- 2021
- Full Text
- View/download PDF
6. 463P Impact of two waves of Sars-Cov-2 outbreak on the clinical presentation and outcomes of newly referred breast cancer cases at AP-HP: A retrospective multicenter cohort study
- Author
-
Priou, S., Guével, E., Lamé, G., Wassermann, J., Bey, R., Uzan, C., Chatellier, G., Belkacémi, Y., Tannier, X., Guillerm, S., Flicoteaux, R., Gligorov, J., Cohen, A., Benderra, M-A., Teixeira, L., Daniel, C., Tournigand, C., and Kempf, E.
- Published
- 2023
- Full Text
- View/download PDF
7. DOCUMENT SEARCH IN LARGE RHEUMATOLOGY DATABASES: ADVANCED KEYWORD QUERIES TO SELECT HOMOGENEOUS PHENOTYPES.
- Author
-
Gérardin, C., Xong, Y., Mekinian, A., Carrt, F., and Tannier, X.
- Published
- 2023
- Full Text
- View/download PDF
8. Overview of INEX 2013
- Author
-
Bellot, P., Doucet, A., Geva, S., Gurajada, S., Kamps, J., Kazai, G., Koolen, M., Mishra, A., Moriceau, V., Mothe, J., Preminger, M., SanJuan, E., Schenkel, R., Tannier, X., Theobald, M., Trappett, M., Wang, Q., Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B., Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Equipe Hultech - Laboratoire GREYC - UMR6072, Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen (GREYC), Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Normandie Université (NU)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Centre National de la Recherche Scientifique (CNRS)-Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Centre National de la Recherche Scientifique (CNRS), Queensland University of Technology [Brisbane] (QUT), University of Amsterdam [Amsterdam] (UvA), Microsoft Research [Redmond], Microsoft Corporation [Redmond, Wash.], Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Systèmes d’Informations Généralisées (IRIT-SIG), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées, Oslo and Akershus University College of Applied Sciences [Oslo] (HiOA), Saarland University [Saarbrücken], Max-Planck-Institut für Informatik (MPII), Max-Planck-Gesellschaft, ILLC (FGw), Language and Computation (ILLC, FNWI/FGw), Queensland University of Technology - QUT (AUSTRALIA), University of Amsterdam - UvA (NETHERLANDS), Universität Passau (GERMANY), Laboratoire des Sciences de l'Information et des Systèmes (LSIS), Centre National de la Recherche Scientifique (CNRS)-Arts et Métiers Paristech ENSAM Aix-en-Provence-Université de Toulon (UTLN)-Aix Marseille Université (AMU), Groupe de Recherche en Informatique, Image et Instrumentation de Caen (GREYC), Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Normandie Université (NU)-Université de Caen Normandie (UNICAEN), Normandie Université (NU), Microsoft Research [Cambridge] (Microsoft), Microsoft Research, Universiteit van Amsterdam (UvA), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Toulouse - Jean Jaurès (UT2J), Oslo Metropolitan University (OsloMet), Universität Passau [Passau], Laboratoire de Recherche en Informatique (LRI), CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Arts et Métiers Paristech ENSAM Aix-en-Provence-Centre National de la Recherche Scientifique (CNRS), MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Université Toulouse III - Paul Sabatier (UT3), Pamela Forner, Henning Müller, Roberto Paredes, Paolo Rosso, and Benno Stein
- Subjects
Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,02 engineering and technology ,Semantic data model ,Task (project management) ,Tweet contextualization ,020204 information systems ,INEX ,0202 electrical engineering, electronic engineering, information engineering ,Linked data track ,Contextualization ,Théorie de l'information ,Information retrieval ,Recherche d'information ,Linked data ,Snippet ,Social book search track ,Clef ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,[INFO.INFO-IT]Computer Science [cs]/Information Theory [cs.IT] ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,020201 artificial intelligence & image processing ,Snippet retrieval ,Computational linguistics ,Knowledge transfer - Abstract
Article disponible en ligne : http://people.mpi-inf.mpg.de/~amishra/papers/bell-over13.pdf; International audience; INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2013 evaluation campaign, which consisted of four activities addressing three themes: searching professional and user generated data (Social Book Search track); searching structured or semantic data (Linked Data track); and focused retrieval (Snippet Retrieval and Tweet Contextualization tracks). INEX 2013 was an exciting year for INEX in which we consolidated the collaboration with (other activities in) CLEF and for the second time ran our work shop as part of the CLEF labs in order to facilitate knowledge transfer between the evaluation forums. This paper gives an overview of all the INEX 2013 tracks, their aims and task, the built test-collections, and gives an initial analysis of the results.
- Published
- 2013
9. Clinical information extraction at the clef ehealth evaluation lab 2016
- Author
-
Névéol, A., Cohen, K. B., Grouin, C., Thierry Hamon, Lavergne, T., Kelly, L., Goeuriot, L., Rey, G., Robert, A., Tannier, X., Zweigenbaum, P., Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Université Paris-Saclay, Modélisation et Recherche d’Information Multimédia [Grenoble] (MRIM ), Laboratoire d'Informatique de Grenoble (LIG ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Dublin City University [Dublin] (DCU), Université Paris 13 (UP13), Recherches épidémiologiques et statistiques sur l'environnement et la santé., and Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut National de la Santé et de la Recherche Médicale (INSERM)
- Subjects
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2016
10. Overview of INEX 2014
- Author
-
Bellot, P., Bogers, T., Geva, S., Hall, M., Huurdeman, H., Kamps, J., Kazai, G., Koolen, M., Moriceau, V., Mothe, J., Preminger, M., SanJuan, E., Schenkel, R., Skov, M., Tannier, X., Walsh, D., Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hanbury, A., Toms, E., Laboratoire des Sciences de l'Information et des Systèmes (LSIS), Centre National de la Recherche Scientifique (CNRS)-Arts et Métiers Paristech ENSAM Aix-en-Provence-Université de Toulon (UTLN)-Aix Marseille Université (AMU), Aalborg University [Denmark] (AAU), Queensland University of Technology [Brisbane] (QUT), Edge Hill University, University of Amsterdam [Amsterdam] (UvA), Semion Ltd (London, UK), Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Systèmes d’Informations Généralisées (IRIT-SIG), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse - Jean Jaurès (UT2J), Oslo and Akershus University College of Applied Sciences [Oslo] (HiOA), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, University of Passau, Aix-Marseille Université - AMU (FRANCE), Arts et Métiers ParisTech (FRANCE), Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Université du Sud Toulon-Var - USTV (FRANCE), Aalborg University (DENMARK), Edge Hill University (UNITED KINGDOM), Oslo and Akershus University College of Applied Sciences - HiOA (NORWAY), Queensland University of Technology - QUT (AUSTRALIA), Semion Ltd (UNITED KINGDOM), University of Amsterdam - UvA (NETHERLANDS), Université d'Avignon et des Pays de Vaucluse (FRANCE), Universität Passau (GERMANY), Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - LIMSI (Orsay, France), Institut National Polytechnique de Toulouse - INPT (FRANCE), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Arts et Métiers Paristech ENSAM Aix-en-Provence-Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Language and Computation (ILLC, FNWI/FGw), ILLC (FGw), Cultural Heritage and Identity, and Faculteit der Geesteswetenschappen
- Subjects
User information ,Authoritative metadata ,Contextualization ,Théorie de l'information ,Information retrieval ,Information theory ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Recherche d'information ,Clef ,Task (project management) ,Test (assessment) ,Metadata ,[INFO.INFO-IT]Computer Science [cs]/Information Theory [cs.IT] ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,Social Book Search Track ,Computational linguistics ,User interface - Abstract
INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2014 evaluation campaign, which consisted of three tracks: The Interactive Social Book Search Track investigated user information seeking behavior when interacting with various sources of information, for realistic task scenarios, and how the user interface impacts search and the search experience. The Social Book Search Track investigated the relative value of authoritative metadata and user-generated content for search and recommendation using a test collection with data from Amazon and LibraryThing, including user profiles and personal catalogues. The Tweet Contextualization Track investigated tweet contextualization, helping a user to understand a tweet by providing him with a short background summary generated from relevant Wikipedia passages aggregated into a coherent summary. INEX 2014 was an exciting year for INEX in which we for the third time ran our workshop as part of the CLEF labs. This paper gives an overview of all the INEX 2014 tracks, their aims and task, the built test-collections, the participants, and gives an initial analysis of the results.
- Published
- 2014
11. Mixed-instance querying
- Author
-
Bonaque, R., primary, Cao, T. D., additional, Cautis, B., additional, Goasdoué, F., additional, Letelier, J., additional, Manolescu, I., additional, Mendoza, O., additional, Ribeiro, S., additional, and Tannier, X., additional
- Published
- 2016
- Full Text
- View/download PDF
12. Utilisation de la langue naturelle pour l'interrogation de documents structurés
- Author
-
Tannier, X., Girardot, Jean-Jacques, Mathieu, Mihaela, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur ( LIMSI ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Maison des Sciences de l'Homme et de l'Environnement Claude Nicolas Ledoux ( MSHE ), Centre National de la Recherche Scientifique ( CNRS ) -Université de Franche-Comté ( UFC ), Théoriser et modéliser pour aménager ( ThéMA ), Université de Bourgogne ( UB ) -Centre National de la Recherche Scientifique ( CNRS ) -Université de Franche-Comté ( UFC ), Département Informatique pour les Systèmes Coopératifs Ouverts et Décentralisés ( ISCOD-ENSMSE ), École des Mines de Saint-Étienne ( Mines Saint-Étienne MSE ), Institut Mines-Télécom [Paris]-Institut Mines-Télécom [Paris]-Institut Henri Fayol, Breuil, Florent, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Maison des Sciences de l'Homme et de l'Environnement Claude Nicolas Ledoux (MSHE), Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS), Théoriser et modéliser pour aménager (UMR 6049) (ThéMA), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS)-Université de Bourgogne (UB), Département Informatique pour les Systèmes Coopératifs Ouverts et Décentralisés (ISCOD-ENSMSE), École des Mines de Saint-Étienne (Mines Saint-Étienne MSE), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut Henri Fayol, Département Réseaux, Information, Multimédia (RIM-ENSMSE), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Centre G2I
- Subjects
ComputingMilieux_MISCELLANEOUS - Abstract
http://www.asso-aria.org/coria/2005/19.pdf; International audience; Le langage de requête est l'indispensable interface entre l'utilisateur et l'outil de recherche. Simplifié au maximum dans les cas où les moteurs indexent essentiellement des documents plats, il devient fort complexe lorsqu'il s'adresse à des documents structurés et qu'il s'a git de définir des contraintes portant à la fois sur la structure et le contenu. L'approche ici- décrite propose d'utiliser la langue naturelle comme interface pour exprimer de telles requêtes. L'article décrit dans un premier temps les différentes phases qui permettent de transformer (dans un cadre de recherche d'information) la requête en langage naturel en une représentation sémantique indépendante du contexte. Des règles de simplification adaptées à la structure et au domaine du corpus sont ensuite appliquées, permettant d'obtenir une forme finale, adaptée à une conversion ver s un langage de requête formel. L'article décrit enfin les expérimentations effectuées et tir e les premières conclusions sur divers aspects de cette approche .
- Published
- 2005
13. Report on INEX 2013
- Author
-
Bellot, P., primary, Doucet, A., additional, Geva, S., additional, Gurajada, S., additional, Kamps, J., additional, Kazai, G., additional, Koolen, M., additional, Mishra, A., additional, Moriceau, V., additional, Mothe, J., additional, Preminger, M., additional, SanJuan, E., additional, Schenkel, R., additional, Tannier, X., additional, Theobald, M., additional, Trappett, M., additional, Trotman, A., additional, Sanderson, M., additional, Scholer, F., additional, and Wang, Q., additional
- Published
- 2013
- Full Text
- View/download PDF
14. Report on INEX 2012
- Author
-
Bellot, P., primary, Chappell, T., additional, Doucet, A., additional, Geva, S., additional, Gurajada, S., additional, Kamps, J., additional, Kazai, G., additional, Koolen, M., additional, Landoni, M., additional, Marx, M., additional, Mishra, A., additional, Moriceau, V., additional, Mothe, J., additional, Preminger, M., additional, Ramírez, G., additional, Sanderson, M., additional, Sanjuan, E., additional, Scholer, F., additional, Schuh, A., additional, Tannier, X., additional, Theobald, M., additional, Trappett, M., additional, Trotman, A., additional, and Wang, Q., additional
- Published
- 2012
- Full Text
- View/download PDF
15. Report on INEX 2010
- Author
-
Alexander, D., primary, Arvola, P., additional, Beckers, T., additional, Bellot, P., additional, Chappell, T., additional, DeVries, C. M., additional, Doucet, A., additional, Fuhr, N., additional, Geva, S., additional, Kamps, J., additional, Kazai, G., additional, Koolen, M., additional, Kutty, S., additional, Landoni, M., additional, Moriceau, V., additional, Nayak, R., additional, Nordlie, R., additional, Pharo, N., additional, SanJuan, E., additional, Schenkel, R., additional, Tagarelli, A., additional, Tannier, X., additional, Thom, J. A., additional, Trotman, A., additional, Vainio, J., additional, Wang, Q., additional, and Wu, C., additional
- Published
- 2011
- Full Text
- View/download PDF
16. Evaluating Temporal Graphs Built from Texts via Transitive Reduction
- Author
-
Tannier, X., primary and Muller, P., additional
- Published
- 2011
- Full Text
- View/download PDF
17. Report on INEX 2009
- Author
-
Beckers, T., primary, Bellot, P., additional, Demartini, G., additional, Denoyer, L., additional, De Vries, C. M., additional, Doucet, A., additional, Fachry, K. N., additional, Fuhr, N., additional, Gallinari, P., additional, Geva, S., additional, Huang, W.-C., additional, Iofciu, T., additional, Kamps, J., additional, Kazai, G., additional, Koolen, M., additional, Kutty, S., additional, Landoni, M., additional, Lehtonen, M., additional, Moriceau, V., additional, Nayak, R., additional, Nordlie, R., additional, Pharo, N., additional, SanJuan, E., additional, Schenkel, R., additional, Tannier, X., additional, Theobald, M., additional, Thom, J. A., additional, Trotman, A., additional, and de Vries, A. P., additional
- Published
- 2010
- Full Text
- View/download PDF
18. Prompt Engineering Paradigms for Medical Applications: Scoping Review.
- Author
-
Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, and Lovis C
- Subjects
- Humans, Medical Informatics methods, Natural Language Processing
- Abstract
Background: Prompt engineering, focusing on crafting effective prompts to large language models (LLMs), has garnered attention for its capabilities at harnessing the potential of LLMs. This is even more crucial in the medical domain due to its specialized terminology and language technicity. Clinical natural language processing applications must navigate complex language and ensure privacy compliance. Prompt engineering offers a novel approach by designing tailored prompts to guide models in exploiting clinically relevant information from complex medical texts. Despite its promise, the efficacy of prompt engineering in the medical domain remains to be fully explored., Objective: The aim of the study is to review research efforts and technical approaches in prompt engineering for medical applications as well as provide an overview of opportunities and challenges for clinical practice., Methods: Databases indexing the fields of medicine, computer science, and medical informatics were queried in order to identify relevant published papers. Since prompt engineering is an emerging field, preprint databases were also considered. Multiple data were extracted, such as the prompt paradigm, the involved LLMs, the languages of the study, the domain of the topic, the baselines, and several learning, design, and architecture strategies specific to prompt engineering. We include studies that apply prompt engineering-based methods to the medical domain, published between 2022 and 2024, and covering multiple prompt paradigms such as prompt learning (PL), prompt tuning (PT), and prompt design (PD)., Results: We included 114 recent prompt engineering studies. Among the 3 prompt paradigms, we have observed that PD is the most prevalent (78 papers). In 12 papers, PD, PL, and PT terms were used interchangeably. While ChatGPT is the most commonly used LLM, we have identified 7 studies using this LLM on a sensitive clinical data set. Chain-of-thought, present in 17 studies, emerges as the most frequent PD technique. While PL and PT papers typically provide a baseline for evaluating prompt-based approaches, 61% (48/78) of the PD studies do not report any nonprompt-related baseline. Finally, we individually examine each of the key prompt engineering-specific information reported across papers and find that many studies neglect to explicitly mention them, posing a challenge for advancing prompt engineering research., Conclusions: In addition to reporting on trends and the scientific landscape of prompt engineering, we provide reporting guidelines for future studies to help advance research in the medical field. We also disclose tables and figures summarizing medical prompt engineering papers available and hope that future contributions will leverage these existing works to better advance the field., (©Jamil Zaghir, Marco Naguib, Mina Bjelogrlic, Aurélie Névéol, Xavier Tannier, Christian Lovis. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 10.09.2024.)
- Published
- 2024
- Full Text
- View/download PDF
19. Improving Interpretability of Leucocyte Classification with Multimodal Network.
- Author
-
Chossegros M, Tannier X, and Stockholm D
- Subjects
- Humans, Leukocytes classification, Leukocytes cytology, Neural Networks, Computer
- Abstract
White blood cell classification plays a key role in the diagnosis of hematologic diseases. Models can perform classification either from images or based on morphological features. Image-based classification generally yields higher performance, but feature-based classification is more interpretable for clinicians. In this study, we employed a Multimodal neural network to classify white blood cells, utilizing a combination of images and morphological features. We compared this approach with image-only and feature-only training. While the highest performance was achieved with image-only training, the Multimodal model provided enhanced interpretability by the computation of SHAP values, and revealed crucial morphological features for biological characterization of the cells.
- Published
- 2024
- Full Text
- View/download PDF
20. From Syntactic to Semantic Interoperability Using a Hyperontology in the Oncology Domain.
- Author
-
El Ghosh M, Kalokyri V, Sambres M, Vaterkowski M, Duclos C, Tannier X, Tsakou G, Tsiknakis M, Daniel C, and Dhombres F
- Subjects
- Humans, Biological Ontologies, Health Information Interoperability, Medical Oncology, Neoplasms, Big Data, Semantics
- Abstract
Interoperability is crucial to overcoming various challenges of data integration in the healthcare domain. While OMOP and FHIR data standards handle syntactic heterogeneity among heterogeneous data sources, ontologies support semantic interoperability to overcome the complexity and disparity of healthcare data. This study proposes an ontological approach in the context of the EUCAIM project to support semantic interoperability among distributed big data repositories that have applied heterogeneous cancer image data models using a semantically well-founded Hyperontology for the oncology domain.
- Published
- 2024
- Full Text
- View/download PDF
21. The More, the Better? Modalities of Metastatic Status Extraction on Free Medical Reports Based on Natural Language Processing.
- Author
-
Kempf E, Priou S, Cohen A, Redjdal A, Guével E, and Tannier X
- Subjects
- Humans, Neoplasms pathology, Electronic Health Records, Natural Language Processing, Neoplasm Metastasis
- Published
- 2024
- Full Text
- View/download PDF
22. Evaluating Plasmodium falciparum automatic detection and parasitemia estimation: A comparative study on thin blood smear images.
- Author
-
Acherar A, Tannier X, Tantaoui I, Brossas JY, Thellier M, and Piarroux R
- Subjects
- Humans, Retrospective Studies, Erythrocytes parasitology, Image Processing, Computer-Assisted methods, Neural Networks, Computer, Flow Cytometry methods, Plasmodium falciparum isolation & purification, Parasitemia diagnosis, Parasitemia blood, Parasitemia parasitology, Malaria, Falciparum diagnosis, Malaria, Falciparum blood, Malaria, Falciparum parasitology, Microscopy methods
- Abstract
Malaria is a deadly disease that is transmitted through mosquito bites. Microscopists use a microscope to examine thin blood smears at high magnification (1000x) to identify parasites in red blood cells (RBCs). Estimating parasitemia is essential in determining the severity of the Plasmodium falciparum infection and guiding treatment. However, this process is time-consuming, labor-intensive, and subject to variation, which can directly affect patient outcomes. In this retrospective study, we compared three methods for measuring parasitemia from a collection of anonymized thin blood smears of patients with Plasmodium falciparum obtained from the Clinical Department of Parasitology-Mycology, National Reference Center (NRC) for Malaria in Paris, France. We first analyzed the impact of the number of field images on parasitemia count using our framework, MALARIS, which features a top-classifier convolutional neural network (CNN). Additionally, we studied the variation between different microscopists using two manual techniques to demonstrate the need for a reliable and reproducible automated system. Finally, we included thin blood smear images from an additional 102 patients to compare the performance and correlation of our system with manual microscopy and flow cytometry. Our results showed strong correlations between the three methods, with a coefficient of determination between 0.87 and 0.92., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Acherar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF
23. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions.
- Author
-
Petit-Jean T, Gérardin C, Berthelot E, Chatellier G, Frank M, Tannier X, Kempf E, and Bey R
- Subjects
- Humans, Data Warehousing, Algorithms, France, Confidentiality, Natural Language Processing, Electronic Health Records, Workflow, Machine Learning
- Abstract
Objective: To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow., Materials and Methods: The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting., Results: The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95%CI 94.5-96.3), 95.4 (95%CI 94.0-96.3), 96.0 (95%CI 94.0-96.7), and 99.2 (95%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry., Conclusions: We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes., (© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2024
- Full Text
- View/download PDF
24. Predicting the age of field Anopheles mosquitoes using mass spectrometry and deep learning.
- Author
-
Mohammad N, Naudion P, Dia AK, Boëlle PY, Konaté A, Konaté L, Niang EHA, Piarroux R, Tannier X, and Nabet C
- Subjects
- Animals, Malaria transmission, Malaria prevention & control, Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization methods, Senegal, Mass Spectrometry methods, Aging physiology, Anopheles physiology, Deep Learning, Mosquito Vectors physiology
- Abstract
Mosquito-borne diseases like malaria are rising globally, and improved mosquito vector surveillance is needed. Survival of Anopheles mosquitoes is key for epidemiological monitoring of malaria transmission and evaluation of vector control strategies targeting mosquito longevity, as the risk of pathogen transmission increases with mosquito age. However, the available tools to estimate field mosquito age are often approximate and time-consuming. Here, we show a rapid method that combines matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry with deep learning for mosquito age prediction. Using 2763 mass spectra from the head, legs, and thorax of 251 field-collected Anopheles arabiensis mosquitoes, we developed deep learning models that achieved a best mean absolute error of 1.74 days. We also demonstrate consistent performance at two ecological sites in Senegal, supported by age-related protein changes. Our approach is promising for malaria control and the field of vector biology, benefiting other disease vectors like Aedes mosquitoes.
- Published
- 2024
- Full Text
- View/download PDF
25. Impact of Translation on Biomedical Information Extraction: Experiment on Real-Life Clinical Notes.
- Author
-
Gérardin C, Xiong Y, Wajsbürt P, Carrat F, and Tannier X
- Abstract
Background: Biomedical natural language processing tasks are best performed with English models, and translation tools have undergone major improvements. On the other hand, building annotated biomedical data sets remains a challenge., Objective: The aim of our study is to determine whether the use of English tools to extract and normalize French medical concepts based on translations provides comparable performance to that of French models trained on a set of annotated French clinical notes., Methods: We compared 2 methods: 1 involving French-language models and 1 involving English-language models. For the native French method, the named entity recognition and normalization steps were performed separately. For the translated English method, after the first translation step, we compared a 2-step method and a terminology-oriented method that performs extraction and normalization at the same time. We used French, English, and bilingual annotated data sets to evaluate all stages (named entity recognition, normalization, and translation) of our algorithms., Results: The native French method outperformed the translated English method, with an overall F1-score of 0.51 (95% CI 0.47-0.55), compared with 0.39 (95% CI 0.34-0.44) and 0.38 (95% CI 0.36-0.40) for the 2 English methods tested., Conclusions: Despite recent improvements in translation models, there is a significant difference in performance between the 2 approaches in favor of the native French method, which is more effective on French medical texts, even with few annotated documents., (© Christel Gérardin, Yuhan Xiong, Perceval Wajsbürt, Fabrice Carrat, Xavier Tannier. Originally published in JMIR Medical Informatics (https://medinform.jmir.org).)
- Published
- 2024
- Full Text
- View/download PDF
26. Development and Validation of a Natural Language Processing Algorithm to Pseudonymize Documents in the Context of a Clinical Data Warehouse.
- Author
-
Tannier X, Wajsbürt P, Calliger A, Dura B, Mouchet A, Hilka M, and Bey R
- Abstract
Objective: The objective of this study is to address the critical issue of deidentification of clinical reports to allow access to data for research purposes, while ensuring patient privacy. The study highlights the difficulties faced in sharing tools and resources in this domain and presents the experience of the Greater Paris University Hospitals (AP-HP for Assistance Publique-Hôpitaux de Paris) in implementing a systematic pseudonymization of text documents from its Clinical Data Warehouse., Methods: We annotated a corpus of clinical documents according to 12 types of identifying entities and built a hybrid system, merging the results of a deep learning model as well as manual rules., Results and Discussion: Our results show an overall performance of 0.99 of F1-score. We discuss implementation choices and present experiments to better understand the effort involved in such a task, including dataset size, document types, language models, or rule addition. We share guidelines and code under a 3-Clause BSD license., Competing Interests: None declared., (The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).)
- Published
- 2024
- Full Text
- View/download PDF
27. Re-evaluating fetal scalp pH thresholds: An examination of fetal pH variations during labor.
- Author
-
Girault A, Le Ray C, Garabedian C, Goffinet F, and Tannier X
- Subjects
- Pregnancy, Humans, Female, Retrospective Studies, Reproducibility of Results, Fetus, Fetal Blood, Heart Rate, Fetal physiology, Hydrogen-Ion Concentration, Fetal Monitoring, Scalp, Labor, Obstetric physiology
- Abstract
Introduction: Since the 1970s, fetal scalp blood sampling (FSBS) has been used as a second-line test of the acid-base status of the fetus to evaluate fetal well-being during labor. The commonly employed thresholds that delineate normal pH (>7.25), subnormal (7.20-7.25), and pathological pH (<7.20) guide clinical decisions. However, these experienced-based thresholds, based on observations and common sense, have yet to be confirmed. The aim of the study was to investigate if pH drop rate accelerates at the common thresholds (7.25 and 7.20) and to explore the possibility of identifying more accurate thresholds., Material and Methods: A retrospective study was conducted at a tertiary maternity hospital between June 2017 and July 2021. Patients with at least one FSBS during labor for category II fetal heart rate and delivery of a singleton cephalic infant were included. The rate of change in pH value between consecutive samples for each patient was calculated and plotted as a function of pH value. Linear regression models were used to model the evolution of the pH drop rate estimating slope and standard errors across predefined pH intervals. Exploration of alternative pH action thresholds was conducted. To explore the independence of the association between pH value and pH drop rate, multiple linear regression adjusted on age, body mass index, parity, oxytocin stimulation and suspected small for gestational age was performed., Results: We included 2047 patients with at least one FSBS (total FSBS 3467); with 2047 umbilical cord blood pH, and a total of 5514 pH samples. Median pH values were 7.29 1 h before delivery, 7.26 30 min before delivery. The pH drop was slow between 7.40 and 7.30, then became more pronounced, with median rates of 0.0005 units/min at 7.25 and 0.0013 units/min at 7.20. Out of the alternative pH thresholds, 7.26 and 7.20 demonstrated the best alignment with our dataset. Multiple linear regression revealed that only pH value was significantly associated to the rate of pH change., Conclusions: Our study confirms the validity and reliability of current guideline thresholds for fetal scalp pH in category II fetal heart rate., (© 2023 The Authors. Acta Obstetricia et Gynecologica Scandinavica published by John Wiley & Sons Ltd on behalf of Nordic Federation of Societies of Obstetrics and Gynecology (NFOG).)
- Published
- 2024
- Full Text
- View/download PDF
28. Natural language processing of multi-hospital electronic health records for public health surveillance of suicidality.
- Author
-
Bey R, Cohen A, Trebossen V, Dura B, Geoffroy PA, Jean C, Landman B, Petit-Jean T, Chatellier G, Sallah K, Tannier X, Bourmaud A, and Delorme R
- Abstract
There is an urgent need to monitor the mental health of large populations, especially during crises such as the COVID-19 pandemic, to timely identify the most at-risk subgroups and to design targeted prevention campaigns. We therefore developed and validated surveillance indicators related to suicidality: the monthly number of hospitalisations caused by suicide attempts and the prevalence among them of five known risks factors. They were automatically computed analysing the electronic health records of fifteen university hospitals of the Paris area, France, using natural language processing algorithms based on artificial intelligence. We evaluated the relevance of these indicators conducting a retrospective cohort study. Considering 2,911,920 records contained in a common data warehouse, we tested for changes after the pandemic outbreak in the slope of the monthly number of suicide attempts by conducting an interrupted time-series analysis. We segmented the assessment time in two sub-periods: before (August 1, 2017, to February 29, 2020) and during (March 1, 2020, to June 31, 2022) the COVID-19 pandemic. We detected 14,023 hospitalisations caused by suicide attempts. Their monthly number accelerated after the COVID-19 outbreak with an estimated trend variation reaching 3.7 (95%CI 2.1-5.3), mainly driven by an increase among girls aged 8-17 (trend variation 1.8, 95%CI 1.2-2.5). After the pandemic outbreak, acts of domestic, physical and sexual violence were more often reported (prevalence ratios: 1.3, 95%CI 1.16-1.48; 1.3, 95%CI 1.10-1.64 and 1.7, 95%CI 1.48-1.98), fewer patients died (p = 0.007) and stays were shorter (p < 0.001). Our study demonstrates that textual clinical data collected in multiple hospitals can be jointly analysed to compute timely indicators describing mental health conditions of populations. Our findings also highlight the need to better take into account the violence imposed on women, especially at early ages and in the aftermath of the COVID-19 pandemic., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
29. Nosocomial transmission of Aspergillus flavus in a neonatal intensive care unit: Long-term persistence in environment and interest of MALDI-ToF mass-spectrometry coupled with convolutional neural network for rapid clone recognition.
- Author
-
Mohammad N, Huguenin A, Lefebvre A, Menvielle L, Toubas D, Ranque S, Villena I, Tannier X, Normand AC, and Piarroux R
- Subjects
- Animals, Aspergillus flavus genetics, Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization methods, Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization veterinary, Intensive Care Units, Neonatal, Cross Infection veterinary, Aspergillosis diagnosis, Aspergillosis veterinary
- Abstract
Aspergillosis of the newborn remains a rare but severe disease. We report four cases of primary cutaneous Aspergillus flavus infections in premature newborns linked to incubators contamination by putative clonal strains. Our objective was to evaluate the ability of matrix-assisted laser desorption/ionisation time of flight (MALDI-TOF) coupled to convolutional neural network (CNN) for clone recognition in a context where only a very small number of strains are available for machine learning. Clinical and environmental A. flavus isolates (n = 64) were studied, 15 were epidemiologically related to the four cases. All strains were typed using microsatellite length polymorphism. We found a common genotype for 9/15 related strains. The isolates of this common genotype were selected to obtain a training dataset (6 clonal isolates/25 non-clonal) and a test dataset (3 clonal isolates/31 non-clonal), and spectra were analysed with a simple CNN model. On the test dataset using CNN model, all 31 non-clonal isolates were correctly classified, 2/3 clonal isolates were unambiguously correctly classified, whereas the third strain was undetermined (i.e., the CNN model was unable to discriminate between GT8 and non-GT8). Clonal strains of A. flavus have persisted in the neonatal intensive care unit for several years. Indeed, two strains of A. flavus isolated from incubators in September 2007 are identical to the strain responsible for the second case that occurred 3 years later. MALDI-TOF is a promising tool for detecting clonal isolates of A. flavus using CNN even with a limited training set for limited cost and handling time., (© The Author(s) 2023. Published by Oxford University Press on behalf of The International Society for Human and Animal Mycology.)
- Published
- 2024
- Full Text
- View/download PDF
30. Prediction of amputation risk of patients with diabetic foot using classification algorithms: A clinical study from a tertiary center.
- Author
-
Demirkol D, Erol ÇS, Tannier X, Özcan T, and Aktaş Ş
- Subjects
- Humans, Retrospective Studies, Bayes Theorem, Algorithms, Amputation, Surgical, Diabetic Foot surgery, Diabetic Foot diagnosis, Diabetes Mellitus
- Abstract
Diabetic foot ulcers can have vital consequences, such as amputation for patients. The primary purpose of this study is to predict the amputation risk of diabetic foot patients using machine-learning classification algorithms. In this research, 407 patients treated with the diagnosis of diabetic foot between January 2009-September 2019 in Istanbul University Faculty of Medicine in the Department of Undersea and Hyperbaric Medicine were retrospectively evaluated. Principal Component Analysis (PCA) was used to identify the key features associated with the amputation risk in diabetic foot patients within the dataset. Thus, various prediction/classification models were created to predict the "overall" risk of diabetic foot patients. Predictive machine-learning models were created using various algorithms. Additionally to optimize the hyperparameters of the Random Forest Algorithm (RF), experimental use of Bayesian Optimization (BO) has been employed. The sub-dimension data set comprising categorical and numerical values was subjected to a feature selection procedure. Among all the algorithms tested under the defined experimental conditions, the BO-optimized "RF" based on the hybrid approach (PCA-RF-BO) and "Logistic Regression" algorithms demonstrated superior performance with 85% and 90% test accuracies, respectively. In conclusion, our findings would serve as an essential benchmark, offering valuable guidance in reducing such hazards., (© 2024 The Authors. International Wound Journal published by Medicalhelplines.com Inc and John Wiley & Sons Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
31. No changes in clinical presentation, treatment strategies and survival of pancreatic cancer cases during the SARS-COV-2 outbreak: A retrospective multicenter cohort study on real-world data.
- Author
-
Kempf E, Priou S, Lamé G, Laurent A, Guével E, Tzedakis S, Bey R, Fuks D, Chatellier G, Tannier X, Galula G, Flicoteaux R, Daniel C, and Tournigand C
- Subjects
- Humans, SARS-CoV-2, Cohort Studies, Communicable Disease Control, Retrospective Studies, Pancreatic Neoplasms, COVID-19 epidemiology, Pancreatic Neoplasms epidemiology, Pancreatic Neoplasms therapy
- Abstract
The SARS-COV-2 pandemic disrupted healthcare systems. We assessed its impact on the presentation, care trajectories and outcomes of new pancreatic cancers (PCs) in the Paris area. We performed a retrospective multicenter cohort study on the data warehouse of Greater Paris University Hospitals (AP-HP). We identified all patients newly referred with a PC between January 1, 2019, and June 30, 2021, and excluded endocrine tumors. Using claims data and health records, we analyzed the timeline of care trajectories, the initial tumor stage, the treatment categories: pancreatectomy, exclusive systemic therapy or exclusive best supportive care (BSC). We calculated patients' 1-year overall survival (OS) and compared indicators in 2019 and 2020 to 2021. We included 2335 patients. Referral fell by 29% during the first lockdown. The median time from biopsy and from first MDM to treatment were 25 days (16-50) and 21 days (11-40), respectively. Between 2019 and 2020 to 2021, the rate of metastatic tumors (36% vs 33%, P = .39), the pTNM distribution of the 464 cases with upfront tumor resection (P = .80), and the proportion of treatment categories did not vary: tumor resection (32% vs 33%), exclusive systemic therapy (49% vs 49%), exclusive BSC (19% vs 19%). The 1-year OS rates in 2019 vs 2020 to 2021 were 92% vs 89% (aHR = 1.42; 95% CI, 0.82-2.48), 52% vs 56% (aHR = 0.88; 95% CI, 0.73-1.08), 13% vs 10% (aHR = 1.00; 95% CI, 0.78-1.25), in the treatment categories, respectively. Despite an initial decrease in the number of new PCs, we did not observe any stage shift. OS did not vary significantly., (© 2023 The Authors. International Journal of Cancer published by John Wiley & Sons Ltd on behalf of UICC.)
- Published
- 2023
- Full Text
- View/download PDF
32. Development of a natural language processing model for deriving breast cancer quality indicators : A cross-sectional, multicenter study.
- Author
-
Guével E, Priou S, Flicoteaux R, Lamé G, Bey R, Tannier X, Cohen A, Chatellier G, Daniel C, Tournigand C, and Kempf E
- Subjects
- Female, Humans, Cross-Sectional Studies, Electronic Health Records, Natural Language Processing, Quality Indicators, Health Care, Breast Neoplasms epidemiology, Breast Neoplasms therapy
- Abstract
Objectives: Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information., Method: We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique - Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the Programme de Médicalisation du Système d'Information (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score., Results: Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0-71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators. The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%-93.3%]), an average precision of 77.7% [10.0%-97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators., Discussion: The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated., Conclusions: The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles., Competing Interests: Declaration of Conflicting Interests statement The authors declare no conflict of interest., (Copyright © 2023 Elsevier Masson SAS. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
33. Impact of the COVID-19 pandemic on clinical presentation, treatments, and outcomes of new breast cancer patients: A retrospective multicenter cohort study.
- Author
-
Guével E, Priou S, Lamé G, Wassermann J, Bey R, Uzan C, Chatellier G, Belkacemi Y, Tannier X, Guillerm S, Flicoteaux R, Gligorov J, Cohen A, Benderra MA, Teixeira L, Daniel C, Hersant B, Tournigand C, and Kempf E
- Subjects
- Humans, Female, Pandemics, Cohort Studies, Communicable Disease Control, Retrospective Studies, Breast Neoplasms diagnosis, Breast Neoplasms epidemiology, Breast Neoplasms therapy, COVID-19 epidemiology
- Abstract
Background: The SARS CoV-2 pandemic disrupted healthcare systems. We compared the cancer stage for new breast cancers (BCs) before and during the pandemic., Methods: We performed a retrospective multicenter cohort study on the data warehouse of Greater Paris University Hospitals (AP-HP). We identified all female patients newly referred with a BC in 2019 and 2020. We assessed the timeline of their care trajectories, initial tumor stage, and treatment received: BC resection, exclusive systemic therapy, exclusive radiation therapy, or exclusive best supportive care (BSC). We calculated patients' 1-year overall survival (OS) and compared indicators in 2019 and 2020., Results: In 2019 and 2020, 2055 and 1988, new BC patients underwent cancer treatment, and during the two lockdowns, the BC diagnoses varied by -18% and by +23% compared to 2019. De novo metastatic tumors (15% and 15%, p = 0.95), pTNM and ypTNM distributions of 1332 cases with upfront resection and of 296 cases with neoadjuvant therapy did not differ (p = 0.37, p = 0.3). The median times from first multidisciplinary meeting and from diagnosis to treatment of 19 days (interquartile 11-39 days) and 35 days (interquartile 22-65 days) did not differ. Access to plastic surgery (15% and 17%, p = 0.08) and to treatment categories did not vary: tumor resection (73% and 72%), exclusive systemic therapy (13% and 14%), exclusive radiation therapy (9% and 9%), exclusive BSC (5% and 5%) (p = 0.8). Among resected patients, the neoadjuvant therapy rate was lower in 2019 (16%) versus 2020 (20%) (p = 0.02). One-year OS rates were 99.3% versus 98.9% (HR = 0.96; 95% CI, 0.77-1.2), 72.6% versus 76.6% (HR = 1.28; 95% CI, 0.95-1.72), 96.6% versus 97.8% (HR = 1.09; 95% CI, 0.61-1.94), and 15.5% versus 15.1% (HR = 0.99; 95% CI, 0.72-1.37), in the treatment groups., Conclusions: Despite a decrease in the number of new BCs, there was no tumor stage shift, and OS did not vary., (© 2023 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.)
- Published
- 2023
- Full Text
- View/download PDF
34. Correction: Good practices for clinical data warehouse implementation: A case study in France.
- Author
-
Doutreligne M, Degremont A, Jachiet PA, Lamer A, and Tannier X
- Abstract
[This corrects the article DOI: 10.1371/journal.pdig.0000298.]., (Copyright: © 2023 Doutreligne et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2023
- Full Text
- View/download PDF
35. Clinical Research Informatics: Contributions from 2022.
- Author
-
Tannier X and Kalra D
- Subjects
- Humans, Electronic Health Records, Big Data, Peer Review, Artificial Intelligence, Medical Informatics
- Abstract
Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2022., Method: A bibliographic search using a combination of Medical Subject Headings (MeSH) descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting between the two section editors and the editorial team was organized to finally conclude on the selected three best papers., Results: Among the 1,324 papers returned by the search, published in 2022, that were in the scope of the various areas of CRI, the full review process selected four best papers. The first best paper describes the process undertaken in Germany, under the national Medical Informatics Initiative, to define a process and to gain multi-decision-maker acceptance of broad consent for the reuse of health data for research whilst remaining compliant with the European General Data Protection Regulation. The authors of the second-best paper present a federated architecture for the conduct of clinical trial feasibility queries that utilizes HL7 Fast Healthcare Interoperability Resources and an HL7 standard query representation. The third best paper aligns with the overall theme of this Yearbook, the inclusivity of potential participants in clinical trials, with recommendations to ensure greater equity. The fourth proposes a multi-modal modelling approach for large scale phenotyping from electronic health record information. This year's survey paper has also examined equity, along with data bias, and found that the relevant publications in 2022 have focused almost exclusively on the issue of bias in Artificial Intelligence (AI)., Conclusions: The literature relevant to CRI in 2022 has largely been dominated by publications that seek to maximise the reusability of wide scale and representative electronic health record information for research, either as big data for distributed analysis or as a source of information from which to identify suitable patients accurately and equitably for invitation to participate in clinical trials., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).)
- Published
- 2023
- Full Text
- View/download PDF
36. Good practices for clinical data warehouse implementation: A case study in France.
- Author
-
Doutreligne M, Degremont A, Jachiet PA, Lamer A, and Tannier X
- Abstract
Real-world data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern clinical data warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema, and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multicentric data reuses as well as innovations in routine care., Competing Interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: The first author did a (non-paid) visiting in Leo Anthony Celi’s lab during the first semester of 2023., (Copyright: © 2023 Doutreligne et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2023
- Full Text
- View/download PDF
37. How to Improve Cancer Patients ENrollment in Clinical Trials From rEal-Life Databases Using the Observational Medical Outcomes Partnership Oncology Extension: Results of the PENELOPE Initiative in Urologic Cancers.
- Author
-
Kempf E, Vaterkowski M, Leprovost D, Griffon N, Ouagne D, Breant S, Serre P, Mouchet A, Rance B, Chatellier G, Bellamine A, Frank M, Guerin J, Tannier X, Livartowski A, Hilka M, and Daniel C
- Subjects
- Humans, Data Warehousing, Databases, Factual, Urology, Urologic Neoplasms diagnosis, Urologic Neoplasms therapy
- Abstract
Purpose: To compare the computability of Observational Medical Outcomes Partnership (OMOP)-based queries related to prescreening of patients using two versions of the OMOP common data model (CDM; v5.3 and v5.4) and to assess the performance of the Greater Paris University Hospital (APHP) prescreening tool., Materials and Methods: We identified the prescreening information items being relevant for prescreening of patients with cancer. We randomly selected 15 academic and industry-sponsored urology phase I-IV clinical trials (CTs) launched at APHP between 2016 and 2021. The computability of the related prescreening criteria (PC) was defined by their translation rate in OMOP-compliant queries and by their execution rate on the APHP clinical data warehouse (CDW) containing data of 205,977 patients with cancer. The overall performance of the prescreening tool was assessed by the rate of true- and false-positive cases of three randomly selected CTs., Results: We defined a list of 15 minimal information items being relevant for patients' prescreening. We identified 83 PC of the 534 eligibility criteria from the 15 CTs. We translated 33 and 62 PC in queries on the basis of OMOP CDM v5.3 and v5.4, respectively (translation rates of 40% and 75%, respectively). Of the 33 PC translated in the v5.3 of the OMOP CDM, 19 could be executed on the APHP CDW (execution rate of 58%). Of 83 PC, the computability rate on the APHP CDW reached 23%. On the basis of three CTs, we identified 17, 32, and 63 patients as being potentially eligible for inclusion in those CTs, resulting in positive predictive values of 53%, 41%, and 21%, respectively., Conclusion: We showed that PC could be formalized according to the OMOP CDM and that the oncology extension increased their translation rate through better representation of cancer natural history.
- Published
- 2023
- Full Text
- View/download PDF
38. Improving the Detection of Epidemic Clones in Candida parapsilosis Outbreaks by Combining MALDI-TOF Mass Spectrometry and Deep Learning Approaches.
- Author
-
Mohammad N, Normand AC, Nabet C, Godmer A, Brossas JY, Blaize M, Bonnal C, Fekkar A, Imbert S, Tannier X, and Piarroux R
- Abstract
Identifying fungal clones propagated during outbreaks in hospital settings is a problem that increasingly confronts biologists. Current tools based on DNA sequencing or microsatellite analysis require specific manipulations that are difficult to implement in the context of routine diagnosis. Using deep learning to classify the mass spectra obtained during the routine identification of fungi by MALDI-TOF mass spectrometry could be of interest to differentiate isolates belonging to epidemic clones from others. As part of the management of a nosocomial outbreak due to Candida parapsilosis in two Parisian hospitals, we studied the impact of the preparation of the spectra on the performance of a deep neural network. Our purpose was to differentiate 39 otherwise fluconazole-resistant isolates belonging to a clonal subset from 56 other isolates, most of which were fluconazole-susceptible, collected during the same period and not belonging to the clonal subset. Our study carried out on spectra obtained on four different machines from isolates cultured for 24 or 48 h on three different culture media showed that each of these parameters had a significant impact on the performance of the classifier. In particular, using different culture times between learning and testing steps could lead to a collapse in the accuracy of the predictions. On the other hand, including spectra obtained after 24 and 48 h of growth during the learning step restored the good results. Finally, we showed that the deleterious effect of the device variability used for learning and testing could be largely improved by including a spectra alignment step during preprocessing before submitting them to the neural network. Taken together, these experiments show the great potential of deep learning models to identify spectra of specific clones, providing that crucial parameters are controlled during both culture and preparation steps before submitting spectra to a classifier.
- Published
- 2023
- Full Text
- View/download PDF
39. Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study.
- Author
-
Gérardin C, Mageau A, Mékinian A, Tannier X, and Carrat F
- Abstract
Background: Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English., Objective: We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases., Methods: Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision., Results: For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes., Conclusions: Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients., (©Christel Gérardin, Arthur Mageau, Arsène Mékinian, Xavier Tannier, Fabrice Carrat. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 19.12.2022.)
- Published
- 2022
- Full Text
- View/download PDF
40. Analysis of risk factors for amputation in patients with diabetic foot ulcers: a cohort study from a tertiary center.
- Author
-
Demirkol D, Aktaş Ş, Özcan T, Tannier X, and Selçukcan Erol Ç
- Subjects
- Humans, Male, Female, Adult, Middle Aged, Aged, Aged, 80 and over, Cohort Studies, Retrospective Studies, C-Reactive Protein, Creatinine, Amputation, Surgical, Risk Factors, Lipoproteins, Diabetic Foot surgery, Diabetes Mellitus
- Abstract
Objective: This study aimed to analyze risk factors for amputation (overall, minor and major) in patients with diabetic foot ulcers (DFUs)., Methods: 407 patients with DFUs (286 male, 121 female; mean age = 60, age range = 32-92) who were managed in a tertiary care centre from 2009 to 2019 were retrospectively identified and included in the study. DFUs were categorized based on the Meggit-Wagner, PEDIS, S(AD)SAD, and University of Texas (UT) classification systems. To identify amputation risk-related factors, results of patients with DFUs who underwent amputations (minor or major) were compared to those who received other adjunctive treatments using Chi-Square, oneway analysis of variance (ANOVA) and Spearman correlation analysis., Results: The mean C-reactive protein (CRP) and White Blood Cell (WBC) values were significantly higher in patients with major or minor amputation than in those without amputation. The mean Neutrophil (PNL), Platelets (PLT), wound width, creatinine and sedimentation (ESR) values were significantly higher in patients with major amputation compared to other groups of patients. Elevated levels of Highdensity lipoprotein (HDL), Hemoglobin (HGB) and albumin were determined to be protective factors against the risk of amputation. Spearman correlation analysis revealed a positive-sided, strong-levelled, significant relation between Wagner grades and amputation status of patients., Conclusion: This study has identified specific factors for major and minor amputation risk of patients with DFUs. Especially infection markers such as CRP, WBC, ESR and PNL were higher in the amputation group. Most importantly, Meggit Wagner, one of the four different classification systems used in the DFUs, was determined to be highly associated with patients' amputation risk., Level of Evidence: Level IV, Prognostic Study.
- Published
- 2022
- Full Text
- View/download PDF
41. Influence of the SARS-CoV-2 outbreak on management and prognosis of new lung cancer cases, a retrospective multicentre real-life cohort study.
- Author
-
Priou S, Lamé G, Zalcman G, Wislez M, Bey R, Chatellier G, Cadranel J, Tannier X, Zelek L, Daniel C, Tournigand C, and Kempf E
- Subjects
- Cohort Studies, Communicable Disease Control, Humans, Pandemics, Prognosis, Retrospective Studies, SARS-CoV-2, COVID-19 epidemiology, Lung Neoplasms drug therapy, Lung Neoplasms therapy
- Abstract
Introduction: The SARS-CoV-2 pandemic has impacted the care of cancer patients. This study sought to assess the pandemic's impact on the clinical presentations and outcomes of newly referred patients with lung cancer from the Greater Paris area., Methods: We retrospectively retrieved the electronic health records and administrative data of 11.4 million patients pertaining to Greater Paris University Hospital (AP-HP). We compared indicators for the 2018-2019 period to those of 2020 in regard to newly referred lung cancer cases. We assessed the initial tumour stage, the delay between the first multidisciplinary tumour board (MTB) and anticancer treatment initiation, and 6-month overall survival (OS) rates depending on the anticancer treatment, including surgery, palliative systemic treatment, and best supportive care (BSC)., Result: Among 6240 patients with lung cancer, 2179 (35%) underwent tumour resection, 2069 (33%) systemic anticancer therapy, 775 (12%) BSC, whereas 1217 (20%) did not receive any treatment. During the first lockdown, the rate of new diagnoses decreased by 32% compared with that recorded in 2018-2019. Initial tumour stage, repartition of patients among treatment categories, and MTB-related delays remained unchanged. The 6-month OS rates of patients diagnosed in 2018-2019 who underwent tumour resection were 98% versus 97% (HR = 1.2; 95% CI: 0.7-2.0) for those diagnosed in 2020; the respective rates for patients who underwent systemic anticancer therapy were 78% versus 79% (HR = 1.0; 95% CI: 0.8-1.2); these rates were 20% versus 13% (HR = 1.3; 95% CI: 1.1-1.6) for those who received BSC. COVID-19 was associated with poorer OS rates (HR = 2.1; 95% CI: 1.6-3.0) for patients who received systemic anticancer therapy., Conclusions: The SARS-CoV-2 pandemic has not exerted any deleterious impact on 6-month OS of new lung cancer patients that underwent active anticancer therapy in Greater Paris University hospitals., Competing Interests: Conflict of interest statement The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022 Elsevier Ltd. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
42. Clinical Research Informatics.
- Author
-
Daniel C, Tannier X, and Kalra D
- Subjects
- Humans, Big Data, COVID-19, Data Collection, Pandemics, Medical Informatics, Biomedical Research
- Abstract
Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2021., Method: Using PubMed, we did a bibliographic search using a combination of MeSH descriptors and free-text terms on CRI, followed by a double-blind review in order to select a list of candidate best papers to be peer-reviewed by external reviewers. After peer-review ranking, three section editors met for a consensus meeting and the editorial team was organized to finally conclude on the selected three best papers., Results: Among the 1,096 papers (published in 2021) returned by the search and in the scope of the various areas of CRI, the full review process selected three best papers. The first best paper describes an operational and scalable framework for generating EHR datasets based on a detailed clinical model with an application in the domain of the COVID-19 pandemics. The authors of the second best paper present a secure and scalable platform for the preprocessing of biomedical data for deep data-driven health management applied for the detection of pre-symptomatic COVID-19 cases and for biological characterization of insulin-resistance heterogeneity. The third best paper provides a contribution to the integration of care and research activities with the REDCap Clinical Data and Interoperability sServices (CDIS) module improving the accuracy and efficiency of data collection., Conclusions: The COVID-19 pandemic is still significantly stimulating research efforts in the CRI field to improve the process deeply and widely for conducting real-world studies as well as for optimizing clinical trials, the duration and cost of which are constantly increasing. The current health crisis highlights the need for healthcare institutions to continue the development and deployment of Big Data spaces, to strengthen their expertise in data science and to implement efficient data quality evaluation and improvement programs., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).)
- Published
- 2022
- Full Text
- View/download PDF
43. Extraction of Explicit and Implicit Cause-Effect Relationships in Patient-Reported Diabetes-Related Tweets From 2017 to 2021: Deep Learning Approach.
- Author
-
Ahne A, Khetan V, Tannier X, Rizvi MIH, Czernichow T, Orchard F, Bour C, Fano A, and Fagherazzi G
- Abstract
Background: Intervening in and preventing diabetes distress requires an understanding of its causes and, in particular, from a patient's perspective. Social media data provide direct access to how patients see and understand their disease and consequently show the causes of diabetes distress., Objective: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect relationships in patient-reported diabetes-related tweets and provide a methodology to better understand the opinions, feelings, and observations shared within the diabetes online community from a causality perspective., Methods: More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect tweet data set was manually labeled and used to train (1) a fine-tuned BERTweet model to detect causal sentences containing a causal relation and (2) a conditional random field model with Bidirectional Encoder Representations from Transformers (BERT)-based features to extract possible cause-effect associations. Causes and effects were clustered in a semisupervised approach and visualized in an interactive cause-effect network., Results: Causal sentences were detected with a recall of 68% in an imbalanced data set. A conditional random field model with BERT-based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect relationships. "Diabetes" was identified as the central cluster followed by "death" and "insulin." Insulin pricing-related causes were frequently associated with death., Conclusions: A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multiword cause, and the corresponding effect, as expressed in diabetes-related tweets leveraging BERT-based architectures and visualized as cause-effect network. Extracting causal associations in real life, patient-reported outcomes in social media data provide a useful complementary source of information in diabetes research., (©Adrian Ahne, Vivek Khetan, Xavier Tannier, Md Imbesat Hassan Rizvi, Thomas Czernichow, Francisco Orchard, Charline Bour, Andrew Fano, Guy Fagherazzi. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 19.07.2022.)
- Published
- 2022
- Full Text
- View/download PDF
44. Privacy-preserving mimic models for clinical named entity recognition in French.
- Author
-
Bannour N, Wajsbürt P, Rance B, Tannier X, and Névéol A
- Subjects
- Humans, Natural Language Processing, Narration, Privacy
- Abstract
A vast amount of crucial information about patients resides solely in unstructured clinical narrative notes. There has been a growing interest in clinical Named Entity Recognition (NER) task using deep learning models. Such approaches require sufficient annotated data. However, there is little publicly available annotated corpora in the medical field due to the sensitive nature of the clinical text. In this paper, we tackle this problem by building privacy-preserving shareable models for French clinical Named Entity Recognition using the mimic learning approach to enable the knowledge transfer through a teacher model trained on a private corpus to a student model. This student model could be publicly shared without any access to the original sensitive data. We evaluated three privacy-preserving models using three medical corpora and compared the performance of our models to those of baseline models such as dictionary-based models. An overall macro F-measure of 70.6% could be achieved by a student model trained using silver annotations produced by the teacher model, compared to 85.7% for the original private teacher model. Our results revealed that these privacy-preserving mimic learning models offer a good compromise between performance and data privacy preservation., (Copyright © 2022 Elsevier Inc. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
45. Multilabel classification of medical concepts for patient clinical profile identification.
- Author
-
Gérardin C, Wajsbürt P, Vaillant P, Bellamine A, Carrat F, and Tannier X
- Subjects
- Data Mining, Humans, Language, Unified Medical Language System, Multilingualism, Natural Language Processing
- Abstract
Background: The development of electronic health records has provided a large volume of unstructured biomedical information. Extracting patient characteristics from these data has become a major challenge, especially in languages other than English., Methods: Inspired by the French Text Mining Challenge (DEFT 2021) [1] in which we participated, our study proposes a multilabel classification of clinical narratives, allowing us to automatically extract the main features of a patient report. Our system is an end-to-end pipeline from raw text to labels with two main steps: named entity recognition and multilabel classification. Both steps are based on a neural network architecture based on transformers. To train our final classifier, we extended the dataset with all English and French Unified Medical Language System (UMLS) vocabularies related to human diseases. We focus our study on the multilingualism of training resources and models, with experiments combining French and English in different ways (multilingual embeddings or translation)., Results: We obtained an overall average micro-F1 score of 0.811 for the multilingual version, 0.807 for the French-only version and 0.797 for the translated version., Conclusion: Our study proposes an original multilabel classification of French clinical notes for patient phenotyping. We show that a multilingual algorithm trained on annotated real clinical notes and UMLS vocabularies leads to the best results., (Copyright © 2022 Elsevier B.V. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
46. Impact of two waves of Sars-Cov2 outbreak on the number, clinical presentation, care trajectories and survival of patients newly referred for a colorectal cancer: A French multicentric cohort study from a large group of university hospitals.
- Author
-
Kempf E, Priou S, Lamé G, Daniel C, Bellamine A, Sommacale D, Belkacemi Y, Bey R, Galula G, Taright N, Tannier X, Rance B, Flicoteaux R, Hemery F, Audureau E, Chatellier G, and Tournigand C
- Subjects
- Cohort Studies, Hospitals, University, Humans, Pandemics, RNA, Viral, Retrospective Studies, SARS-CoV-2, COVID-19 epidemiology, Colonic Neoplasms, Colorectal Neoplasms epidemiology, Colorectal Neoplasms therapy
- Abstract
The SARS-Cov2 may have impaired care trajectories, patient overall survival (OS), tumor stage at initial presentation for new colorectal cancer (CRC) cases. This study aimed at assessing those indicators before and after the beginning of the pandemic in France. In this retrospective cohort study, we collected prospectively the clinical data of the 11.4 million of patients referred to the Greater Paris University Hospitals (AP-HP). We identified new CRC cases between 1 January 2018 and 31 December 2020, and compared indicators for 2018-2019 to 2020. pTNM tumor stage was extracted from postoperative pathology reports for localized colon cancer, and metastatic status was extracted from CT-scan baseline text reports. Between 2018 and 2020, 3602 and 1083 new colon and rectal cancers were referred to the AP-HP, respectively. The 1-year OS rates reached 94%, 93% and 76% for new CRC patients undergoing a resection of the primary tumor, in 2018-2019, in 2020 without any Sars-Cov2 infection and in 2020 with a Sars-Cov2 infection, respectively (HR 3.78, 95% CI 2.1-7.1). For patients undergoing other kind of anticancer treatment, the percentages are 64%, 66% and 27% (HR 2.1, 95% CI 1.4-3.3). Tumor stage at initial presentation, emergency level of primary tumor resection, delays between the first multidisciplinary meeting and the first anticancer treatment did not differ over time. The SARS-Cov2 pandemic has been associated with less newly diagnosed CRC patients and worse 1-year OS rates attributable to the infection itself rather than to its impact on hospital care delivery or tumor stage at initial presentation., (© 2022 UICC.)
- Published
- 2022
- Full Text
- View/download PDF
47. Identification of a clonal population of Aspergillus flavus by MALDI-TOF mass spectrometry using deep learning.
- Author
-
Normand AC, Chaline A, Mohammad N, Godmer A, Acherar A, Huguenin A, Ranque S, Tannier X, and Piarroux R
- Subjects
- Humans, Neural Networks, Computer, Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization methods, Aspergillus flavus isolation & purification, Deep Learning
- Abstract
The spread of fungal clones is hard to detect in the daily routines in clinical laboratories, and there is a need for new tools that can facilitate clone detection within a set of strains. Currently, Matrix Assisted Laser Desorption-Ionization Time-of-Flight Mass Spectrometry is extensively used to identify microbial isolates at the species level. Since most of clinical laboratories are equipped with this technology, there is a question of whether this equipment can sort a particular clone from a population of various isolates of the same species. We performed an experiment in which 19 clonal isolates of Aspergillus flavus initially collected on contaminated surgical masks were included in a set of 55 A. flavus isolates of various origins. A simple convolutional neural network (CNN) was trained to detect the isolates belonging to the clone. In this experiment, the training and testing sets were totally independent, and different MALDI-TOF devices (Microflex) were used for the training and testing phases. The CNN was used to correctly sort a large portion of the isolates, with excellent (> 93%) accuracy for two of the three devices used and with less accuracy for the third device (69%), which was older and needed to have the laser replaced., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
48. Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study.
- Author
-
Ahne A, Fagherazzi G, Tannier X, Czernichow T, and Orchard F
- Subjects
- Clinical Decision-Making, Humans, Neural Networks, Computer, PubMed, Diabetes Mellitus therapy, Medical Subject Headings
- Abstract
Background: The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty., Objective: We developed a methodology able to explore and target topics of interest via an interactive user interface for health professionals with limited computer science knowledge. We aim to reach near state-of-the-art performance while reducing memory consumption, increasing scalability, and minimizing user interaction effort to improve the clinical decision-making process. The performance was evaluated on diabetes-related abstracts from PubMed., Methods: The methodology consists of 4 parts: (1) a novel interpretable hierarchical clustering of documents where each node is defined by headwords (words that best represent the documents in the node), (2) an efficient classification system to target topics, (3) minimized user interaction effort through active learning, and (4) a visual user interface. We evaluated our approach on 50,911 diabetes-related abstracts providing a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against 3 other strategies: random selection of training instances, uncertainty sampling that chooses instances about which the model is most uncertain, and an expected gradient length strategy based on convolutional neural networks (CNNs)., Results: For the hierarchical clustering performance, we achieved an F1 score of 0.73 compared to 0.76 achieved by scikit-learn. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1 score of all MeSH codes resulted in a satisfying 0.62 F1 score using our approach, 0.61 using the uncertainty strategy, 0.63 using the CNN, and 0.45 using the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents., Conclusions: We proposed an easy-to-use tool for health professionals with limited computer science knowledge who combine their domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore, our approach is memory efficient and highly parallelizable, making it interesting for large Big Data sets. This approach can be used by health professionals to gain deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process., (©Adrian Ahne, Guy Fagherazzi, Xavier Tannier, Thomas Czernichow, Francisco Orchard. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.01.2022.)
- Published
- 2022
- Full Text
- View/download PDF
49. New cancer cases at the time of SARS-Cov2 pandemic and related public health policies: A persistent and concerning decrease long after the end of the national lockdown.
- Author
-
Kempf E, Lamé G, Layese R, Priou S, Chatellier G, Chaieb H, Benderra MA, Bellamine A, Bey R, Bréant S, Galula G, Taright N, Tannier X, Guyet T, Salamanca E, Audureau E, Daniel C, and Tournigand C
- Subjects
- Aged, Female, France epidemiology, Health Policy, Humans, Male, Middle Aged, Neoplasms diagnosis, Quarantine, SARS-CoV-2, COVID-19, Neoplasms epidemiology
- Abstract
Introduction: The dissemination of SARS-Cov2 may have delayed the diagnosis of new cancers. This study aimed at assessing the number of new cancers during and after the lockdown., Methods: We prospectively collected the clinical data of the 11.4 million patients referred to the Assistance Publique Hôpitaux de Paris Teaching Hospital. We identified new cancer cases between 1st January 2018 and 31st September 2020 and compared indicators for 2018 and 2019 to 2020 with a focus on the French lockdown (17th March to 11th May 2020) across cancer types and patient age classes., Results: Between January and September, 28,348, 27,272 and 23,734 new cancer cases were identified in 2018, 2019 and 2020, respectively. The monthly median number of new cases reached 3168 (interquartile range, IQR, 3027; 3282), 3054 (IQR 2945; 3127) and 2723 (IQR 2085; 2,863) in 2018, 2019 and 2020, respectively. From March 1st to May 31st, new cancer decreased by 30% in 2020 compared to the 2018-19 average; then by 9% from 1st June to 31st September. This evolution was consistent across all tumour types: -30% and -9% for colon, -27% and -6% for lung, -29% and -14% for breast, -33% and -12% for prostate cancers, respectively. For patients aged <70 years, the decrease of colorectal and breast new cancers in April between 2018 and 2019 average and 2020 reached 41% and 39%, respectively., Conclusion: The SARS-Cov2 pandemic led to a substantial decrease in new cancer cases. Delays in cancer diagnoses may affect clinical outcomes in the coming years., Competing Interests: Conflict of interest statement The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2021 Elsevier Ltd. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
50. Medical concept normalization in French using multilingual terminologies and contextual embeddings.
- Author
-
Wajsbürt P, Sarfati A, and Tannier X
- Subjects
- Language, Natural Language Processing, Multilingualism, Unified Medical Language System
- Abstract
Introduction: Concept normalization is the task of linking terms from textual medical documents to their concept in terminologies such as the UMLS®. Traditional approaches to this problem depend heavily on the coverage of available resources, which poses a problem for languages other than English., Objective: We present a system for concept normalization in French. We consider textual mentions already extracted and labeled by a named entity recognition system, and we classify these mentions with a UMLS concept unique identifier. We take advantage of the multilingual nature of available terminologies and embedding models to improve concept normalization in French without translation nor direct supervision., Materials and Methods: We consider the task as a highly-multiclass classification problem. The terms are encoded with contextualized embeddings and classified via cosine similarity and softmax. A first step uses a subset of the terminology to finetune the embeddings and train the model. A second step adds the entire target terminology, and the model is trained further with hard negative selection and softmax sampling., Results: On two corpora from the Quaero FrenchMed benchmark, we show that our approach can lead to good results even with no labeled data at all; and that it outperforms existing supervised methods with labeled data., Discussion: Training the system with both French and English terms improves by a large margin the performance of the system on a French benchmark, regardless of the way the embeddings were pretrained (French, English, multilingual). Our distantly supervised method can be applied to any kind of documents or medical domain, as it does not require any concept-labeled documents., Conclusion: These experiments pave the way for simpler and more effective multilingual approaches to processing medical texts in languages other than English., (Copyright © 2021 Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.