Back to Search
Start Over
Adapting a decision tree based tagger for Arabic
- Source :
- 2016 International Conference on Information Technology for Organizations Development (IT4OD).
- Publication Year :
- 2016
- Publisher :
- IEEE, 2016.
-
Abstract
- Several probabilistic methods used for Part of speech (POS) tagging are based on Hidden Markov Models (HMM), these methods have difficulties especially in estimating transition probabilities accurately from limited amounts of training data. Consequently, a new method appeared to avoid problems that HMM face. However, the transition probabilities are estimated using a decision tree. Based on this method a language independent POS tagger (called TreeTagger) has been implemented. The main purpose of this work is to create the language model to adapt TreeTagger for Arabic POS tagging and lemmatization. Furthermore, different configurations have been done, namely, collecting lexical resources, as well as the annotated training corpora. In addition, we used the proposed universal tagset that consists of common POS categories of 22 different languages including Arabic. We highlight the use of this tagger via various experiments on vowelled and unvowelled text from both Modern Standard Arabic and Classical Arabic. In fact, the obtained accuracies rates are 99.4%, 92.6% and 81.9% for respectively the Quranic vowelled corpus "Al-Mus'haf", the unvowelled "Al-Mus'haf1" corpus and for the NEMLAR corpus.
- Subjects :
- business.industry
Arabic
Computer science
Speech recognition
Lemmatisation
Decision tree
02 engineering and technology
computer.software_genre
Part of speech
language.human_language
030507 speech-language pathology & audiology
03 medical and health sciences
0202 electrical engineering, electronic engineering, information engineering
language
Modern Standard Arabic
020201 artificial intelligence & image processing
Artificial intelligence
Language model
0305 other medical science
Hidden Markov model
business
Classical Arabic
computer
Natural language processing
Lemma (morphology)
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2016 International Conference on Information Technology for Organizations Development (IT4OD)
- Accession number :
- edsair.doi...........ff52e3002282e81a5c4bb7ed39bf2f0d
- Full Text :
- https://doi.org/10.1109/it4od.2016.7479306