Automatic Processing of Modern Standard Arabic Text.

Authors :: Ide, Nancy
Véronis, Jean
Baayen, Harald
Church, Kenneth W.
Klavans, Judith
Barnard, David T.
Tufis, Dan
Llisterri, Joaquim
Johansson, Stig
Mariani, Joseph
Soudi, Abdelhadi
van den Bosch, Antal
Neumann, Günter
Diab, Mona
Hacioglu, Kadri
Jurafsky, Daniel
Source :: Arabic Computational Morphology; 2007, p159-179, 21p
Publication Year :: 2007
Abstract: To date, there are no fully automated systems addressing the community's need for fundamental language processing tools for Arabic text. In this chapter, we present a Support Vector Machine (SVM) based approach to automatically tokenize (segmenting off clitics), part-of- speech (POS) tag and annotate Base Phrase Chunks (BPC) in Modern Standard Arabic (MSA) text. We adapt highly accurate tools that have been developed for English text and apply them to Arabic text. Using standard evaluation metrics, we report that the (SVM-TOK) tokenizer achieves an Fß = 1 score of 99.1, the (SVM-POS) tagger achieves an accuracy of 96.6%, and the (SVM-BPC) chunker yields an Fß = 1 score of 91.6. [ABSTRACT FROM AUTHOR]