Back to Search Start Over

Moroccan Arabic vocabulary generation using a rule-based approach.

Authors :
Tachicart, Ridouane
Bouzoubaa, Karim
Source :
Journal of King Saud University - Computer & Information Sciences; Nov2022:Part A, Vol. 34 Issue 10, p8538-8548, 11p
Publication Year :
2022

Abstract

NLP resources play a crucial role in the building of many NLP applications. The importance of these resources depends not only on their size and coverage but also on the richness and the precision of the annotated information they provide. In the case of resource-scarce languages such as Moroccan Arabic, the building of NLP applications is limited due to the lack of these resources. To overcome this problem, we follow a rule-based approach to generate a Moroccan morphological vocabulary (MORV) which constitutes the first step addressing the problem of Moroccan morphological generation. MORV is designed and implemented based on two main components: On one hand, an MA lexicon and a list of fully annotated affixes and clitics that we have created specifically to ensure the generation process. On the other hand, a set of rules covering the concatenation and the orthographic adjustments of the generated words. Moreover, given a base form, MORV outputs more than 4.5 M Moroccan words with rich morphological features such as tense, gender, number, state, etc. We tested the coverage of MORV on texts collected from Moroccan social media and realized that it reaches a vocabulary coverage of 84% and a precision of 94%. This system is a benefit for building other NLP applications such as spell checking, morphological analysis, and machine translation. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13191578
Volume :
34
Issue :
10
Database :
Supplemental Index
Journal :
Journal of King Saud University - Computer & Information Sciences
Publication Type :
Academic Journal
Accession number :
160169854
Full Text :
https://doi.org/10.1016/j.jksuci.2021.02.013