Start Over

Annotation manuelle d'occurrences de candidats termes et écrit scientifique

Authors :: Jacquey, Evelyne
Kister, Laurence
Méoni, Simon
Barreaux, Sabine
Noûs, Camille
Analyse et Traitement Informatique de la Langue Française (ATILF)
Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Jacquey, Evelyne
Publication Year :: 2021
Publisher :: HAL CCSD, 2021.
Abstract: This paper compares two successive annotation campaigns aimed at manually identifying the occurrences of candidate terms that actually fall within the scientific domain of the annotated document. The two campaigns are distinguished by their objectives. The first aimed the enrichment of existing terminological resources. The second had the dual objective of comparing several annotation tools (BRAT, GATE, GLOZZ) and measuring the difficulty of the annotation task in the human and social sciences compared to the so-called hard sciences. A direct comparison between both campaigns is not possible on the basis of the produced corpora. To do this, we use these corpora like learning corpus in the context of a test task. The role of this task is to automatise the manual annotation. The goal is to determine if the second corpus is of better quality than the first one with regards to the test task performances.<br />Cet article compare deux campagnes d'annotation successives visant l'identification manuelle des occurrences de candidats termes qui relèvent effectivement de la discipline scien-tifique de l'article considéré. Les deux campagnes se distinguent par leurs objectifs. La pre-mière visait l'enrichissement de terminologies existantes. La seconde avait le double objectif de comparer plusieurs environnements d'annotation (BRAT, GATE, GLOZZ), et de mesurer la dif-ficulté de la tâche d'annotation en sciences humaines et sociales par rapport aux sciences dites exactes. Les corpus produits ne permettant pas de comparer les deux campagnes directement, nous exploitons ces corpus comme corpus d'apprentissage dans une tâche test qui consiste à automatiser l'annotation manuelle. L'objectif est de savoir si le corpus de la seconde cam-pagne permet d'augmenter les performances de la tâche test par rapport à celui de la première campagne.