Back to Search Start Over

Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits

Authors :
Neifar, Wafa
Hamon, Thierry
Zweigenbaum, Pierre
Ellouze Khemakhem, Mariem
Lamia Hadrich Belguith
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI)
Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919)
Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
Multimedia, InfoRmation systems and Advanced Computing Laboratory (MIRACL)
Faculté des Sciences Economiques et de Gestion de Sfax (FSEG Sfax)
Université de Sfax - University of Sfax-Université de Sfax - University of Sfax
Université Paris 13 (UP13)
Springer
Source :
BASE-Bielefeld Academic Search Engine, International Conference on Intelligent Text Processing and Computational Linguistics, International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Jan 2016, Konya, Turkey

Abstract

International audience; In this paper, we present an adaptation to Modern Standard Arabic of a French and English term extractor. The goal of this work is to reduce the lack of resources and NLP tools for Arabic language in specialised domains. The adaptation firstly focuses on the description of extraction processes similar to those already defined for French and English while considering the morpho-syntactic specificity of Arabic. Agglutination phenomena are further taken into account in the term extraction process. The current state of the adapted system was evaluated on a medical text corpus. 400 maximal candidate terms were examined, among which 288 were correct (72% precision). An error analysis shows that term extraction errors are first due to Part-of-Speech tagging errors and the difficulties induced by non-diacritised texts, then to remaining agglutination phenomena.

Details

Database :
OpenAIRE
Journal :
BASE-Bielefeld Academic Search Engine, International Conference on Intelligent Text Processing and Computational Linguistics, International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Jan 2016, Konya, Turkey
Accession number :
edsair.dedup.wf.001..38f1edde961a36ed0b397565cadc48c2