246 results on '"Hamon, Thierry"'
Search Results
202. Droit rural
- Author
-
Hamon, Thierry, primary
- Published
- 2006
- Full Text
- View/download PDF
203. Les relations dans les terminologies structurées : de la théorie à la pratique
- Author
-
Grabar, Natalia, primary and Hamon, Thierry, additional
- Published
- 2004
- Full Text
- View/download PDF
204. Event-based information extraction for the biomedical domain
- Author
-
Alphonse, Erick, primary, Vetah, Mohamed Ould Abdel, additional, Poibeau, Thierry, additional, Weissenbacher, Davy, additional, Aubin, Sophie, additional, Bessières, Philippe, additional, Bisson, Gilles, additional, Hamon, Thierry, additional, Lagarrigue, Sandrine, additional, Nazarenko, Adeline, additional, Manine, Alain-Pierre, additional, and Nédellec, Claire, additional
- Published
- 2004
- Full Text
- View/download PDF
205. Terminology Structuring Through the Derivational Morphology.
- Author
-
Salakoski, Tapio, Ginter, Filip, Pyysalo, Sampo, Pahikkala, Tapio, Grabar, Natalia, and Hamon, Thierry
- Abstract
In this work, we address the deciphering of semantic relations between terms in order to build structured terminologies. We study particularly the contribution of morphological clues. Among linguistic operations proposed by the morphology, we analyze affixation and suppletion. We show interpretative schemata emerging from morphologically formed lexemes and corresponding terminological relations. Morphology appears to be a useful tool for the deciphering of semantic relations between terms. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
206. Improving Term Extraction with Terminological Resources.
- Author
-
Salakoski, Tapio, Ginter, Filip, Pyysalo, Sampo, Pahikkala, Tapio, Aubin, Sophie, and Hamon, Thierry
- Abstract
Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. Facing the difficulty or impossibility to customize existing tools, we developed a tunable term extractor. It exploits linguistic-based rules in combination with the reuse of existing terminologies, i.e. exogenous disambiguation. Experiments reported here show that the combination of the two strategies allows the extraction of a greater number of term candidates with a higher level of reliability. We further describe the extraction process involving both endogenous and exogenous disambiguation implemented in the term extractor $\rm Y\kern-.36em \lower.7ex\hbox{A}\kern-.25em T\kern-.1667em\lower.7ex\hbox{E}\kern-.08emA$. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
207. A Scalable and Distributed NLP Architecture for Web Document Annotation.
- Author
-
Salakoski, Tapio, Ginter, Filip, Pyysalo, Sampo, Pahikkala, Tapio, Deriviere, Julien, Hamon, Thierry, and Nazarenko, Adeline
- Abstract
In the context of the ALVIS project, which aims at integrating linguistic information in topic-specific search engines, we develop a NLP architecture to linguistically annotate large collections of web documents. This context leads us to face the scalability aspect of Natural Language Processing. The platform can be viewed as a framework using existing NLP tools. We focus on the efficiency of the platform by distributing linguistic processing on several machines. We carry out an an experiment on 55,329 web documents focusing on biology. These 79 million-word collections of web documents have been processed in 3 days on 16 computers. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
208. A step towards the detection of semantic variants of terms in technical documents
- Author
-
Hamon, Thierry, primary, Nazarenko, Adeline, additional, and Gros, Cécile, additional
- Published
- 1998
- Full Text
- View/download PDF
209. Grouping the pharmacovigilance terms with a hybrid approach.
- Author
-
Dupuch, Marie, Dupuch, Laëtitia, Perinet, Amandine, Hamon, Thierry, and Grabar, Natalia
- Published
- 2012
210. Grouping the Pharmacovigilance terms with a Hybrid Approach.
- Author
-
Mantas, John, Andersen, Stig Kjær, Mazzoleni, Maria Christina, Blobel, Bernd, Quaglini, Silvana, Moen, Anne, Dupuch, Marie, Dupuch, Laëtitia, Perinet, Amandine, Hamon, Thierry, and Grabar, Natalia
- Abstract
Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs. It leads to the safety survey of pharmaceutical products. The pharmacovigilance process benefits from the traditional statistical approaches and also from the qualitative information on semantic relations between close ADR terms, such as SMQs or hierarchical levels of MedDRA. In this work, our objective is to detect the semantic relatedness between the ADR MedDRA terms. To achieve this, we combine two approaches: semantic similarity algorithms computed within structured resources and terminology structuring methods applied to a raw list of the MedDRA terms. We compare these methods between them and study their differences and complementarity. The results are evaluated against the gold standard manually compiled within the pharmacovigilance area and also with an expert. The combination of the methods leads to an improved recall. [ABSTRACT FROM AUTHOR]
- Published
- 2012
211. Identification of relations between risk factors and their pathologies or health conditions by mining scientific literature.
- Author
-
Safran, C., Reti, S., Marin, H.F., Hamon, Thierry, Graña, Martin, Raggio, Víctor, Grabar, Natalia, and Naya, Hugo
- Abstract
Risk factors discovery and prevention is an active research field within the biomedical domain. Despite abundant existing information on risk factors, as found in bibliographical databases or on several websites, accessing this information may be difficult. Methods from Natural Language Processing and Information Extraction can be helpful to access it more easily. Specifically, we show a procedure for analyzing massive amounts of scientific literature and for detecting linguistically marked associations between pathologies and risk factors. This approach allowed us to extract over 22,000 risk factors and associated pathologies. The performed evaluations pointed out that (1) over 88% of risk factors for coronary heart disease are correct, (2) associated pathologies, when they could be compared to MeSH indexing, are correct in about 70%, and (3) in existing terminologies links between risk factors and their pathologies are seldom recorded. [ABSTRACT FROM AUTHOR]
- Published
- 2010
212. Exploitation of linguistic indicators for automatic weighting of synonyms induced within three biomedical terminologies.
- Author
-
Safran, C., Reti, S., Marin, H.F., Grabar, Natalia, and Hamon, Thierry
- Abstract
Acquisition and enrichment of lexical resources is an important research area for the computational linguistics. We propose a method for inducing a lexicon of synonyms and for its weighting in order to establish its reliability. The method is based on the analysis of syntactic structure of complex terms. We apply and evaluate the approach on three biomedical terminologies (MeSH, Snomed Int, Snomed CT). Between 7.7 and 33.6% of the induced synonyms are ambiguous and cooccur with other semantic relations. A virtual reference allows to validate 9 to 14% of the induced synonyms. [ABSTRACT FROM AUTHOR]
- Published
- 2010
213. Automatic acquisition of synonyms from French UMLS for enhanced search of EHRs.
- Author
-
Andersen, Stig Kjær, Klein, Gunnar O., Schulz, Stefan, Aarts, Jos, Mazzoleni, M. Cristina, Grabar, Natalia, Varoutas, Paul-Christophe, Rizand, Philippe, Livartowski, Alain, and Hamon, Thierry
- Abstract
Currently, the use of Natural Language Processing (NLP) approaches in order to improve search and exploration of electronic health records (EHRs) within healthcare information systems is not a common practice. One reason for this is the lack of suitable lexical resources: various types of such resources need to be collected or acquired. In this work, we propose a novel method for the acquisition of synonymous resources. This method is language-independent and relies on existence of structured terminologies. It enables to decipher hidden synonymous relations between simple words and terms on the basis of their syntactic analysis and exploitation of their compositionality. Applied to series of synonym terms from the French subset of the UMLS, the method shows 99% precision. The overlap between thus inferred terms and the existing sparse resources of synonyms is very low. [ABSTRACT FROM AUTHOR]
- Published
- 2008
214. Semantic Relations Between Nominals.
- Author
-
HAMON, Thierry
- Published
- 2013
215. Détection des couples de termes translittérés à partir d'un corpus parallèle anglais-arabe
- Author
-
Neifar, Wafa, Hamon, Thierry, Zweigenbaum, Pierre, Ellouze Khemakhem, Mariem, Lamia Hadrich Belguith, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Multimedia, InfoRmation systems and Advanced Computing Laboratory (MIRACL), Faculté des Sciences Economiques et de Gestion de Sfax (FSEG Sfax), Université de Sfax - University of Sfax-Université de Sfax - University of Sfax, and Université Paris 13 (UP13)
- Subjects
corpus parallèle ,[INFO]Computer Science [cs] ,Extraction terminologique bilingue ,translittération ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,alignement de mots - Abstract
Nous présentons une méthode pour extraire des couples de termes médicaux translittérés de l'anglais en caractères arabes. Nous avons proposé un processus de construction des translittérations de termes anglais en arabe. Celui-ci s'appuie sur une étude en corpus pour la création d'une table de correspondances des caractères anglais en arabe mais aussi sur des règles de conversion qui tiennent compte de certaines particularités de la langue arabe comme l'agglutination et la non-voyellation. Nous avons évalué l'apport de l'utilisation de la translittération pour identifier des couples de termes anglais-arabe sur un corpus parallèle de textes médicaux. Les résultats montrent que parmi 137 couples de mots anglais-arabe extraits, 120 sont jugés corrects (soit 87,59%), dont 107 représentent des couples de termes médicaux (soit 89,16% des translittérations correctes et 78,10% des résultats).
216. Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits
- Author
-
Neifar, Wafa, Hamon, Thierry, Zweigenbaum, Pierre, Ellouze Khemakhem, Mariem, Lamia Hadrich Belguith, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Multimedia, InfoRmation systems and Advanced Computing Laboratory (MIRACL), Faculté des Sciences Economiques et de Gestion de Sfax (FSEG Sfax), Université de Sfax - University of Sfax-Université de Sfax - University of Sfax, Université Paris 13 (UP13), and Springer
- Subjects
Terminilogy ,[INFO]Computer Science [cs] ,Modern Standard Arabic ,Term Extraction ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
International audience; In this paper, we present an adaptation to Modern Standard Arabic of a French and English term extractor. The goal of this work is to reduce the lack of resources and NLP tools for Arabic language in specialised domains. The adaptation firstly focuses on the description of extraction processes similar to those already defined for French and English while considering the morpho-syntactic specificity of Arabic. Agglutination phenomena are further taken into account in the term extraction process. The current state of the adapted system was evaluated on a medical text corpus. 400 maximal candidate terms were examined, among which 288 were correct (72% precision). An error analysis shows that term extraction errors are first due to Part-of-Speech tagging errors and the difficulties induced by non-diacritised texts, then to remaining agglutination phenomena.
217. Impact de l'agglutination dans l'extraction de termes en arabe standard moderne
- Author
-
Neifar, Wafa, Hamon, Thierry, Zweigenbaum, Pierre, Ellouze Khemakhem, Mariem, Lamia Hadrich Belguith, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Multimedia, InfoRmation systems and Advanced Computing Laboratory (MIRACL), Faculté des Sciences Economiques et de Gestion de Sfax (FSEG Sfax), Université de Sfax - University of Sfax-Université de Sfax - University of Sfax, and Université Paris 13 (UP13)
- Subjects
Agglutination ,Textes médi- caux ,Extraction de termes ,[INFO]Computer Science [cs] ,Langue Arabe ,Terminologie ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
Nous présentons, dans cet article, une adaptation dun processus dextraction de termes pour larabe standard moderne. Ladaptation a dabord consisté à décrire le processus dextraction des termes de manière similaire à celui défini pour langlais et le français en prenant en compte certaines articularités morpho-syntaxiques de la langue arabe. Puis, nous avons considéré le phénomène de lagglutination de la langue arabe. Lévaluation a été réalisée sur un corpus de textes médicaux. Les résultats montrent que parmi 400 termes candidats maximaux analysés, 288 sont jugés corrects par rapport au domaine (72,1%). Les erreurs dextraction sont dues à létiquetage morpho-syntaxique et à la non-voyellation des textes mais aussi à complexité de la prise en compte de lagglutination.
218. Medication Extraction and Guessing in Swedish, French and English.
- Author
-
Hamon, Thierry, Grabar, Natalia, and Kokkinakis, Dimitrios
- Abstract
Extraction of information related to the medication is an important task within the biomedical area. Our method is applied to different types of documents in three languages. The results indicate that our approach can efficiently update and enrich the existing drug vocabularies. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
219. Similarité sémantique entre phrases : apprentissage par transfert interlingue
- Author
-
Teissèdre, Charles, Belkacem, Thiziri, Arens, Maxime, Pogodalla, Sylvain, Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, Hamon, Thierry, Synapse Développement, Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, and Schneider, Stéphane
- Subjects
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,Similarité Sémantique Textuelle ,Modèles Neuronaux Multilingues ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Apprentissage par transfert Interlingue - Abstract
Dans cet article, nous décrivons une approche exploratoire pour entraîner des modèles de langue et résoudre des tâches d’appariement entre phrases issues de corpus en français et relevant du domaine médical. Nous montrons que, dans un contexte où les données d’entraînement sont en nombre restreint, il peut être intéressant d’opérer un apprentissage par transfert, d’une langue dont nous disposons de plus de ressources pour l’entraînement, vers une langue cible moins dotée de données d’entraînement (le français dans notre cas). Les résultats de nos expérimentations montrent que les modèles de langue multilingues sont capables de transférer des représentations d’une langue à l’autre de façon efficace pour résoudre des tâches de similarité sémantique telles que celles proposées dans le cadre de l’édition 2020 du Défi fouille de texte (DEFT).
- Published
- 2020
220. Participation d’EDF R&D à DEFT 2020
- Author
-
Cao, Danrun, Benamar, Alexandra, Boumghar, Manel, Bothua, Meryl, Ould Ouali, Lydia, Suignard, Philippe, Pogodalla, Sylvain, Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, Hamon, Thierry, EDF R&D (EDF R&D), EDF (EDF), Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, and Schneider, Stéphane
- Subjects
graphes sémantiques ,détection de similarité sémantique ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,données cliniques ,extraction d’information ,Word2Vec ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
Ce papier décrit la participation d’EDF R&D à la campagne d’évaluation DEFT 2020. Notre équipe a participé aux trois tâchés proposées : deux tâches sur le calcul de similarité sémantique entre phrases et une tâche sur l'extraction d'information fine autour d'une douzaine de catégories. Aucune donnée supplémentaire, autre que les données d’apprentissage, n’a été utilisée. Notre équipe obtient des scores au-dessus de la moyenne pour les tâches 1 et 2 et se classe 2e sur la tâche 1. Les méthodes proposées sont facilement transposables à d’autres cas d’application de détection de similarité qui peuvent concerner plusieurs entités du groupe EDF. Notre participation à la tâche 3 nous a permis de tester les avantages et limites de l’outil SpaCy sur l’extraction d’information.
- Published
- 2020
221. DOING@DEFT : cascade de CRF pour l'annotation d'entités cliniques imbriquées
- Author
-
Anne-Lyse Minard, Andréane Roques, Nicolas Hiot, Mirian Halfeld Ferrari Alves, Agata Savary, Laboratoire Ligérien de Linguistique (LLL), Bibliothèque nationale de France (BnF)-Université d'Orléans (UO)-Université de Tours (UT)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Informatique Fondamentale d'Orléans (LIFO), Université d'Orléans (UO)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA), Laboratoire d'Informatique Fondamentale et Appliquée de Tours (LIFAT), Université de Tours (UT)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, Hamon, Thierry, Bibliothèque nationale de France (BnF)-Université d'Orléans (UO)-Université de Tours-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Tours-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), LLL-CNRS, Université d'Orléans, LIFO, Université d'Orléans, LIFO, Université d'Orléans et Ennov, LIFAT, université de Tours, Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, Pogodalla, Sylvain, and Schneider, Stéphane
- Subjects
CRF ,cas cliniques ,extraction d’information fine ,apprentissage automatique ,entités cliniques ,entités imbriquées ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
National audience; Cet article présente le système développé par l’équipe DOING pour la campagne d’évaluation DEFT 2020 portant sur la similarité sémantique et l’extraction d’information fine. L’équipe a participé uniquement à la tâche 3 : "extraction d’information". Nous avons utilisé une cascade de CRF pour annoter les différentes informations à repérer. Nous nous sommes concentrés sur la question de l’imbrication des entités et de la pertinence d’un type d’entité pour apprendre à reconnaître un autre. Nous avons également testé l’utilisation d’une ressource externe, MedDRA, pour améliorer les performances du système et d’un pipeline plus complexe mais ne gérant pas l’imbrication des entités. Nous avons soumis 3 runs et nous obtenons en moyenne sur toutes les classes des F-mesures de 0,64, 0,65 et 0,61.
- Published
- 2020
222. Contextualized French Language Models for Biomedical Named Entity Recognition
- Author
-
Jenny Copara, Julien Knafou, Nona Naderi, Claudia Moro, Patrick Ruch, Douglas Teodoro, Haute Ecole Spécialisée de Suisse Occidentale (HES-SO), Pontifical Catholic University of Paraná (PUCPR), Pontifical Catholic University of Paraná, Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, Hamon, Thierry, HES-SO -University of Applied Sciences and Arts of Western Switzerland, HES-SO - University of Applied Sciences and Arts of Western Switzerland, PUCPR - Pontifical Catholic University of Paraná, Curitiba, Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, Pogodalla, Sylvain, and Schneider, Stéphane
- Subjects
Encapsulation de mots contextualisés ,Reconnaissance d’entités nommées ,CamemBERT ,named entity recognition ,CRF ,ddc:616.0757 ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Named entity recognition ,Camem- BERT ,Contextualized word embeddings ,contextualized word embeddings ,BERT - Abstract
Named entity recognition (NER) is key for biomedical applications as it allows knowledge discovery in free text data. As entities are semantic phrases, their meaning is conditioned to the context to avoid ambiguity. In this work, we explore contextualized language models for NER in French biomedical text as part of the Défi Fouille de Textes challenge. Our best approach achieved an F1 -measure of 66% for symptoms and signs, and pathology categories, being top 1 for subtask 1. For anatomy, dose, exam, mode, moment, substance, treatment, and value categories, it achieved an F1 -measure of 75% (subtask 2). If considered all categories, our model achieved the best result in the challenge, with an F1 -measure of 72%. The use of an ensemble of neural language models proved to be very effective, improving a CRF baseline by up to 28% and a single specialised language model by 4 La reconnaissance des entités nommées (NER) est essentielle pour les applications biomédicales car elle permet la découverte de connaissances dans des données en texte libre. Comme les entités sont des phrases sémantiques, leur signification est conditionnée par le contexte pour éviter toute ambiguïté. Dans ce travail, nous explorons les modèles de langage contextualisés pour la NER dans les textes biomédicaux français dans le cadre du Défi Fouille de Textes. Notre meilleure approche a obtenu une mesure F1 de 66% pour les symptômes et les signes, et les catégories de pathologie, en étant dans le top 1 pour la sous-tâche 1. Pour les catégories anatomie, dose, examen, mode, moment, substance, traitement et valeur, elle a obtenu une mesure F1 de 75% (sous-tâche 2). Si l’on considère toutes les catégories, notre modèle a obtenu le meilleur résultat dans le cadre de ce défi, avec une mesure F1 de 72%. L’utilisation d’un ensemble de modèles de langages neuronaux s’est révélée très efficace, améliorant une base de référence du CRF de 28% et un modèle de langage spécialisé unique de 4%.
- Published
- 2020
223. DEFT 2020 : détection de similarité entre phrases et extraction d'information
- Author
-
Tapi Nzali, Mike, Reezocar, Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, Hamon, Thierry, Pogodalla, Sylvain, Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, and Schneider, Stéphane
- Subjects
apprentissage automatique ,détection de similarité sémantique ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,extraction d’information ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
Ce papier décrit la participation de Reezocar à la campagne d’évaluation DEFT 2020. Cette seizième édition du challenge a porté sur le calcul de similarité entre phrases et l’extraction d’information fine autour d’une douzaine de catégories dans des textes rédigés en Français. Le challenge propose trois tâches : (i) la première concerne l’identification du degré de similarité entre paires de phrases ; (ii) la deuxième concerne l’identification des phrases parallèles possibles pour une phrase source et (iii) la troisième concerne l’extraction d’information. Nous avons utilisé des méthodes d’apprentissage automatique pour effectuer ces tâches et avons obtenu des résultats satisfaisants sur l’ensemble des tâches.
- Published
- 2020
224. DEFT 2020 - Extraction d’information fine dans les données cliniques : terminologies spécialisées et graphes de connaissance
- Author
-
Lemaitre, Thomas, Gosset, Camille, Lafourcade, Mathieu, Patel, Namrata, Mayoral, Guilhem, Exploration et exploitation de données textuelles (TEXTE), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Onaos, Université Paul-Valéry - Montpellier 3 (UPVM), Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, and Hamon, Thierry
- Subjects
Extraction d’information fine ,Graphes de connaissance ,Knowledge graphs ,Clinical data ,Fine-grained information extraction ,Données cliniques ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
National audience; This paper presents our rule-based approach for fine-grained information extraction in clinical data, submitted in reponse to Task 3 at the DEFT 2020 evaluation campaign. We design (1) a dedicated medical terminology from existing medical references and (2) a knowledge graph based on the semantically rich knowlege base - JeuxDeMots; Nous présentons dans cet article notre approche à base de règles conçue pour répondre à la tâche 3 de la campagne d’évaluation DEFT 2020. Selon le type d’information à extraire, nous construisons (1) une terminologie spécialisée à partir de ressources médicales et (2) un graphe orienté basé sur les informations extraites de la base de connaissances généraliste et de grande taille - JeuxDeMots.
- Published
- 2020
225. Fine-grained Information Extraction in Clinical Data : Dedicated Terminologies and Know- ledge Graphs
- Author
-
Lemaitre, Thomas, Gosset, Camille, Lafourcade, Mathieu, Patel, Namrata, Mayoral, Guilhem, Exploration et exploitation de données textuelles (TEXTE), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Onaos, Université Paul-Valéry - Montpellier 3 (UPVM), Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, Hamon, Thierry, Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), University Paul-Valéry Montpellier 3, Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, Pogodalla, Sylvain, and Schneider, Stéphane
- Subjects
Extraction d’information fine ,Graphes de connaissance ,Knowledge graphs ,Clinical data ,Fine-grained information extraction ,Données cliniques ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,graphes de connaissance - Abstract
National audience; This paper presents our rule-based approach for fine-grained information extraction in clinical data, submitted in reponse to Task 3 at the DEFT 2020 evaluation campaign. We design (1) a dedicated medical terminology from existing medical references and (2) a knowledge graph based on the semantically rich knowlege base - JeuxDeMots; Nous présentons dans cet article notre approche à base de règles conçue pour répondre à la tâche 3 de la campagne d’évaluation DEFT 2020. Selon le type d’information à extraire, nous construisons (1) une terminologie spécialisée à partir de ressources médicales et (2) un graphe orienté basé sur les informations extraites de la base de connaissances généraliste et de grande taille - JeuxDeMots.
- Published
- 2020
226. Calcul de similarité entre phrases : quelles mesures et quels descripteurs ?
- Author
-
Buscaldi, Davide, Felhi, Ghazi, Ghoul, Dhaou, Le Roux, Josepth, Lejeune, Gaël, Zhang, Xudong, Laboratoire d'Informatique de Paris-Nord (LIPN), Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Nord, Sens, Texte, Informatique, Histoire (STIH), Sorbonne Université (SU), Équipe Linguistique computationnelle (STIH-LC), Sorbonne Université (SU)-Sorbonne Université (SU), Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, Hamon, Thierry, LIPN, Sorbonne Paris Nord, STIH, Sorbonne Université, Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, Pogodalla, Sylvain, and Schneider, Stéphane
- Subjects
distance de Bray-Curtis ,n-grammes de caractères ,distance euclidienne ,similarité ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
Cet article présente notre participation à l’édition 2020 du Défi Fouille de Textes DEFT 2020 et plus précisément aux deux tâches ayant trait à la similarité entre phrases. Dans notre travail nous nous sommes intéressé à deux questions : celle du choix de la mesure du similarité d’une part et celle du choix des opérandes sur lesquelles se porte la mesure de similarité. Nous avons notamment étudié la question de savoir s’il fallait utiliser des mots ou des chaînes de caractères (mots ou non-mots). Nous montrons d’une part que la similarité de Bray-Curtis peut être plus efficace et surtout plus stable que la similarité cosinus et d’autre part que le calcul de similarité sur des chaînes de caractères est plus efficace que le même calcul sur des mots.
- Published
- 2020
227. Expanding a dictionary of marker words for uncertainty and negation using distributional semantics
- Author
-
Aron Henriksson, Carita Paradis, Roza Baskalayci, Andreas Kerren, Alyaa Alfalahi, Maria Skeppstedt, Rickard Ahlbom, Lars Asker, Grouin, Cyril, Hamon, Thierry, Névéol, Aurélie, and Zweigenbaum, Pierre
- Subjects
Information retrieval ,Computer and Information Science ,business.industry ,Computer science ,marker words ,Statistical semantics ,dictionary expansion ,computer.software_genre ,clinical text ,Ranking (information retrieval) ,Negation ,distributional semantics ,negation ,Artificial intelligence ,Distributional semantics ,uncertainty ,business ,computer ,Systemvetenskap, informationssystem och informatik ,Natural language processing ,Word (group theory) ,Information Systems - Abstract
Approaches to determining the factuality of diagnoses and findings in clinical text tend to rely on dictionaries of marker words for uncertainty and negation. Here, a method for semi-automatically expanding a dictionary of marker words using distributional semantics is presented and evaluated. It is shown that ranking candidates for inclusion according to their proximity to cluster centroids of semantically similar seed words is more successful than ranking them according to proximity to each individual seed word.
- Published
- 2015
228. A Scalable and Distributed NLP Architecture for Web Document Annotation
- Author
-
Thierry Hamon, Julien Derivière, Adeline Nazarenko, Hamon, Thierry, and Tapio Salakoski, Filip Ginter, Sampo Pyysalo, Tapio Pahikkala
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Deep linguistic processing ,Computer science ,business.industry ,Annotation ,Text segmentation ,Electronic document ,Context (language use) ,computer.software_genre ,NLP architecture ,Web documents ,World Wide Web ,Search engine ,Rule-based machine translation ,Scalability ,Artificial intelligence ,business ,computer ,Natural language processing ,Natural language ,Natural Language Processing - Abstract
In the context of the ALVIS project, which aims at integrating linguistic information in topic-specific search engines, we develop a NLP architecture to linguistically annotate large collections of web documents. This context leads us to face the scalability aspect of Natural Language Processing. The platform can be viewed as a framework using existing NLP tools. We focus on the efficiency of the platform by distributing linguistic processing on several machines. We carry out an an experiment on 55,329 web documents focusing on biology. These 79 million-word collections of web documents have been processed in 3 days on 16 computers.
- Published
- 2006
- Full Text
- View/download PDF
229. Les relations dans les terminologies structurées : de la théorie à la pratique
- Author
-
Natalia Grabar, Thierry Hamon, Ingenierie des connaissances en santé, IFR58-Université Paris Descartes - Paris 5 (UPD5)-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratoire d'Informatique de Paris-Nord (LIPN), Université Sorbonne Paris Cité (USPC)-Institut Galilée-Université Paris 13 (UP13)-Centre National de la Recherche Scientifique (CNRS), IFR58-Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National de la Santé et de la Recherche Médicale ( INSERM ), Laboratoire d'Informatique de Paris-Nord ( LIPN ), Université Paris 13 ( UP13 ) -Université Sorbonne Paris Cité ( USPC ) -Institut Galilée-Centre National de la Recherche Scientifique ( CNRS ), and Hamon, Thierry
- Subjects
060201 languages & linguistics ,[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,relations transversales ,relations taxinomiques ,06 humanities and the arts ,16. Peace & justice ,acquisition de relations entre termes ,030205 complementary & alternative medicine ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,03 medical and health sciences ,0302 clinical medicine ,synonymie ,Artificial Intelligence ,0602 languages and literature ,structuration de terminologie ,terminologie ,[ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Software - Abstract
Dans cet article, nous nous interessons a la structuration des termes d'un domaine, c'est-a-dire a l'acquisition de relations entre termes. En effet, les terminologies ne peuvent plus se contenter de recenser les termes et les organiser brievement sous forme d'une hierarchie. Elles doivent egalement proposer toute une gamme de relations qui refletent au mieux les connaissances du domaine et repondent de maniere adaptee aux besoins des applications. Nous confrontons la place accordee traditionnellement par la theorie terminologique aux relations avec les besoins reels qui apparaissent lors de la constitution, l'utilisation et la reutilisation de ressources terminologiques. Nous presentons egalement un panorama des approches proposees pour l'acquisition de relations entre termes a partir de corpus specialises.
- Published
- 2004
230. Detection of synonymy links between terms: experiment and results
- Author
-
Thierry Hamon, Adeline Nazarenko, Hamon, Thierry, and Didier Bourigault, Christian Jacquemin and Marie-Claude L'Homme
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,copora ,synonymy relation ,Terminology ,Natural Language Processing - Published
- 2001
231. A step towards the detection of semantic variants of terms in technical documents
- Author
-
Thierry Hamon, Adeline Nazarenko, Cecile Gros, Laboratoire d'Informatique de Paris-Nord (LIPN), Université Paris 13 (UP13)-Institut Galilée-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS), EDF R&D (EDF R&D), EDF (EDF), and Hamon, Thierry
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Computer science ,Synonym ,synonymy ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,NLP ,01 natural sciences ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Terminology ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,0105 earth and related environmental sciences ,060201 languages & linguistics ,Information retrieval ,business.industry ,06 humanities and the arts ,Technical documentation ,030205 complementary & alternative medicine ,0602 languages and literature ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
This paper reports the results of a preliminary experiment on the detection of semantic variants of terms in a French technical document. The general goal of our work is to help the structuration of terminologies. Two kinds of semantic variants can be found in traditional terminologies : strict synonymy links and fuzzier relations like see-also. We have designed three rules which exploit general dictionary information to infer synonymy relations between complex candidate terms. The results have been examined by a human terminologist. The expert has judged that half of the overall pairs of terms are relevant for the semantic variation. He validated an important part of the detected links as synonymy. Moreover, it appeared that numerous errors are due to few mis-interpreted links: they could be eliminated by few exception rules.
- Published
- 1998
- Full Text
- View/download PDF
232. Automated classification of textual documents based on a controlled vocabulary in engineering
- Author
-
Thierry Hamon, Anders Ardö, Koraljka Golub, and Hamon, Thierry
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Thesaurus (information retrieval) ,business.industry ,Computer science ,Speech recognition ,Subject (documents) ,Library and Information Sciences ,computer.software_genre ,Classification ,Term (time) ,Weighting ,Information engineering ,If and only if ,Controlled vocabulary ,Information Retrieval ,Index term ,Artificial intelligence ,business ,computer ,Natural language processing ,Natural Language Processing - Abstract
Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require training documents – instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and enrichment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.
233. Participation de l’équipe du LIMICS à DEFT 2020
- Author
-
Perceval Wajsbürt, Yoann Taillé, Guillaume Lainé, Xavier Tannier, Sorbonne Université, Inserm, LIMICS, Benzitoun, Christophe, Braud, Chloé, Huber, Laurine, Langlois, David, Ouni, Slim, Pogodalla, Sylvain, Schneider, Stéphane, Laboratoire d'Informatique Médicale et Ingénierie des Connaissances en e-Santé (LIMICS), Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Sorbonne Paris Nord, Institut des Sciences du Calcul et des Données (ISCD), Sorbonne Université (SU), Cardon, Rémi, Grabar, Natalia, Grouin, Cyril, and Hamon, Thierry
- Subjects
Reconnaissance d’entités nommées ,entités imbriquées ,apprentissage profond ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
Nous présentons dans cet article les méthodes conçues et les résultats obtenus lors de notre participation à la tâche 3 de la campagne d’évaluation DEFT 2020, consistant en la reconnaissance d’entités nommées du domaine médical. Nous proposons deux modèles différents permettant de prendre en compte les entités imbriquées, qui représentent une des difficultés du jeu de données proposées, et présentons les résultats obtenus. Notre meilleur run obtient la meilleure performance parmi les participants, sur l’une des deux sous-tâches du défi.
234. Automatic Prediction of Semantic Labels for French Medical Terms.
- Author
-
Hamon T and Grabar N
- Subjects
- Natural Language Processing, Semantics, Unified Medical Language System
- Abstract
We address the problem of semantic labeling of terms in two French medical corpora with the subset of the UMLS. We perform two experiments relying on the structure of words and terms, and on their context: 1) the semantic label of already identified terms is predicted; 2) the terms are detected in raw texts and their semantic label is predicted. Our results show over 0.90 F-measure.
- Published
- 2022
- Full Text
- View/download PDF
235. Visualizing Food-Drug Interactions in the Thériaque Database.
- Author
-
Lalanne F, Bedouch P, Simonnet C, Depras V, Bordea G, Bourqui R, Hamon T, Thiessard F, and Mougin F
- Subjects
- Animals, Databases, Factual, Mice, Food-Drug Interactions
- Abstract
This paper presents a prototype for the visualization of food-drug interactions implemented in the MIAM project, whose objective is to develop methods for the extraction and representation of these interactions and to make them available in the Thériaque database. The prototype provides users with a graphical visualization showing the hierarchies of drugs and foods in front of each other and the links between them representing the existing interactions as well as additional details about them, including the number of articles reporting the interaction. The prototype is interactive in the following ways: hierarchies can be easily folded and unfolded, a filter can be applied to view only certain types of interactions, and details about a given interaction are displayed when the mouse is moved over the corresponding link. Future work includes proposing a version more suitable for non-health professional users and the representation of the food hierarchy based on a reference classification.
- Published
- 2021
- Full Text
- View/download PDF
236. Generalizability of Readability Models for Medical Terms.
- Author
-
Pylieva H, Chernodub A, Grabar N, and Hamon T
- Subjects
- Language, Natural Language Processing, Supervised Machine Learning, Algorithms, Comprehension
- Abstract
Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. We propose to combine supervised machine learning algorithms using various features with word embeddings which contain context information of words. Data in French are manually cross-annotated by seven annotators. On the basis of these data, we propose cross-validation scenarios in order to test the generalization ability of models to detect the difficulty of medical words. On data provided by seven annotators, we show that the models are generalizable from one annotator to another.
- Published
- 2019
- Full Text
- View/download PDF
237. User Profile Detection in Health Online Fora.
- Author
-
Pertin C, Deccache C, Gagnayre R, and Hamon T
- Subjects
- Humans, Internet, Social Media, Data Mining, Diabetes Mellitus, Patients
- Abstract
Exchanges between diabetic patients on discussion fora permit to study their understanding of their disorder, their behavior and needs when facing health problems. When analyzing these exchanges and behavior, it is necessary to collect information on user profile. We present an approach combining lexicon and super-vised classifiers for the identification of age and gender of contributors, their disorders and relation between contributor and patient. According to parameters of the method, precision is between 100% for gender and 53.48% for disorders.
- Published
- 2018
238. Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016.
- Author
-
Névéol A, Cohen KB, Grouin C, Hamon T, Lavergne T, Kelly L, Goeuriot L, Rey G, Robert A, Tannier X, and Zweigenbaum P
- Abstract
This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Semantic Groups in the Unified Medical Language System
® (UMLS® ), which was also used for normalizing the entities. In addition, we introduced a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10). Participant systems were evaluated against a blind reference standard of 832 titles of scientific articles indexed in MEDLINE, 4 drug monographs published by the European Medicines Agency (EMEA) and 27,850 death certificates using Precision, Recall and F-measure. In total, seven teams participated, including five in the entity recognition and normalization task, and five in the death certificate coding task. Three teams submitted their systems to our newly offered reproducibility track. For entity recognition, the highest performance was achieved on the EMEA corpus, with an overall F-measure of 0.702 for plain entities recognition and 0.529 for normalized entity recognition. For entity normalization, the highest performance was achieved on the MEDLINE corpus, with an overall F-measure of 0.552. For death certificate coding, the highest performance was 0.848 F-measure.- Published
- 2016
239. Health consumer-oriented information retrieval.
- Author
-
Claveau V, Hamon T, Le Maguer S, and Grabar N
- Subjects
- Machine Learning, Patient Access to Records, Consumer Health Information organization & administration, Data Mining methods, Electronic Health Records organization & administration, Health Information Systems organization & administration, Natural Language Processing, User-Computer Interface
- Abstract
While patients can freely access their Electronic Health Records or online health information, they may not be able to correctly understand the content of these documents. One of the challenges is related to the difference between expert and non-expert languages. We propose to investigate this issue within the Information Retrieval field. The patient queries have to be associated with the corresponding expert documents, that provide trustworthy information. Our approach relies on a state-of-the-art IR system called Indri and on semantic resources. Different query expansion strategies are explored. Our system shows up to 0.6740 P@10, up to 0.7610 R@10, and up to 0.6793 NDCG@10.
- Published
- 2015
240. Generating and Executing Complex Natural Language Queries across Linked Data.
- Author
-
Hamon T, Mougin F, and Grabar N
- Subjects
- Databases, Factual, Humans, Semantics, Information Storage and Retrieval methods, Natural Language Processing
- Abstract
With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.
- Published
- 2015
241. Identification of relations between risk factors and their pathologies or health conditions by mining scientific literature.
- Author
-
Hamon T, Graña M, Raggio V, Grabar N, and Naya H
- Subjects
- Abstracting and Indexing methods, Databases, Bibliographic, Disease, Medical Subject Headings, Natural Language Processing, Risk Factors, Semantics, United States, Data Mining standards
- Abstract
Risk factors discovery and prevention is an active research field within the biomedical domain. Despite abundant existing information on risk factors, as found in bibliographical databases or on several websites, accessing this information may be difficult. Methods from Natural Language Processing and Information Extraction can be helpful to access it more easily. Specifically, we show a procedure for analyzing massive amounts of scientific literature and for detecting linguistically marked associations between pathologies and risk factors. This approach allowed us to extract over 22,000 risk factors and associated pathologies. The performed evaluations pointed out that (1) over 88% of risk factors for coronary heart disease are correct, (2) associated pathologies, when they could be compared to MeSH indexing, are correct in about 70%, and (3) in existing terminologies links between risk factors and their pathologies are seldom recorded.
- Published
- 2010
242. Exploitation of linguistic indicators for automatic weighting of synonyms induced within three biomedical terminologies.
- Author
-
Grabar N and Hamon T
- Subjects
- Linguistics, Medical Subject Headings, Natural Language Processing, Systematized Nomenclature of Medicine, Semantics
- Abstract
Acquisition and enrichment of lexical resources is an important research area for the computational linguistics. We propose a method for inducing a lexicon of synonyms and for its weighting in order to establish its reliability. The method is based on the analysis of syntactic structure of complex terms. We apply and evaluate the approach on three biomedical terminologies (MeSH, Snomed Int, Snomed CT). Between 7.7 and 33.6% of the induced synonyms are ambiguous and cooccur with other semantic relations. A virtual reference allows to validate 9 to 14% of the induced synonyms.
- Published
- 2010
243. Exploitation of speculation markers to identify the structure of biomedical scientific writing.
- Author
-
Grabar N and Hamon T
- Subjects
- Algorithms, Science, Biomedical Research, Writing
- Abstract
The motivation of this work is to study the use of speculation markers within scientific writing: this may be useful for discovering whether these markers are regularly spread across biomedical articles and then for establishing the logical structure of articles. To achieve these objectives, we compute associations between article sections and speculation markers. We use machine learning algorithms to show that there are strong and interesting associations between speculation markers and article structure. For instance, strong markers, which strongly influence the presentation of knowledge, are specific to Results, Discussion and Abstract; while non strong markers appear with higher regularity within Material and Methods. Our results indicate that speculation is governed by observable usage rules within scientific articles and can help their structuring.
- Published
- 2009
244. Automatic acquisition of synonym resources and assessment of their impact on the enhanced search in EHRs.
- Author
-
Grabar N, Varoutas PC, Rizand P, Livartowski A, and Hamon T
- Subjects
- France, Humans, Hospital Information Systems organization & administration, Medical Records Systems, Computerized, Natural Language Processing, Terminology as Topic
- Abstract
Objective: Currently, the use of natural language processing (NLP) approaches in order to improve search and exploration of electronic health records (EHRs) within healthcare information systems is not a common practice. One reason for this is the lack of suitable lexical resources. Indeed, in order to support such tasks, various types of such resources need to be collected or acquired (i.e., morphological, orthographic, synonymous)., Methods: We propose a novel method for the acquisition of synonymy resources. This method is language-independent and relies on existence of structured terminologies. It enables to decipher hidden synonymy relations between simple words and terms on the basis of their syntactic analysis and exploitation of their compositionality., Results: Applied to series of synonym terms from the French subset of the UMLS , the method shows 99% precision. The overlap between thus inferred terms and the existing sparse resources of synonyms is very low. In order to better integrate these resources in an EHR search system, we analyzed a sample of clinical queries submitted by healthcare professionals., Conclusions: Observation of clinical queries shows that they make a very little use of the query expansion function, and, whenever they do, synonymy relations are rarely involved.
- Published
- 2009
- Full Text
- View/download PDF
245. Combination of endogenous clues for profiling inferred semantic relations: experiments with Gene Ontology.
- Author
-
Grabar N, Jaulent MC, and Hamon T
- Subjects
- Computational Biology methods, Databases as Topic, Gene Expression Profiling methods, Genomics, Information Management, Microarray Analysis methods, Molecular Biology methods, Information Storage and Retrieval methods, Semantics, Vocabulary, Controlled
- Abstract
Acquisition and enrichment of lexical resources is acknowledged as an important research in the area of computational linguistics. While such resources are often missing, specialized domains, ie biomedicine, propose several structured terminologies. In this paper, we propose a high-quality method for exploiting a structured terminology and inferring elementary synonym lexicon. The method is based on the analysis of syntactic structure of complex terms. The inferred synonym pairs are then profiled according to different clues endogenously computed within the same terminology. We apply and evaluate the approach on the Gene Ontology biomedical terminology.
- Published
- 2008
246. Automatic acquisition of synonyms from French UMLS for enhanced search of EHRs.
- Author
-
Grabar N, Varoutas PC, Rizand P, Livartowski A, and Hamon T
- Subjects
- Algorithms, Data Collection, Dictionaries as Topic, France, Knowledge Bases, Unified Medical Language System, Information Storage and Retrieval, Medical Records Systems, Computerized, Multilingualism, Natural Language Processing, Vocabulary, Controlled
- Abstract
Currently, the use of Natural Language Processing (NLP) approaches in order to improve search and exploration of electronic health records (EHRs) within healthcare information systems is not a common practice. One reason for this is the lack of suitable lexical resources: various types of such resources need to be collected or acquired. In this work, we propose a novel method for the acquisition of synonymous resources. This method is language-independent and relies on existence of structured terminologies. It enables to decipher hidden synonymous relations between simple words and terms on the basis of their syntactic analysis and exploitation of their compositionality. Applied to series of synonym terms from the French subset of the UMLS, the method shows 99% precision. The overlap between thus inferred terms and the existing sparse resources of synonyms is very low.
- Published
- 2008
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.