Back to Search Start Over

Persian in MULTEXT-East Framework.

Authors :
Salakoski, Tapio
Ginter, Filip
Pyysalo, Sampo
Pahikkala, Tapio
QasemiZadeh, Behrang
Rahimi, Saeed
Source :
Advances in Natural Language Processing; 2006, p541-551, 11p
Publication Year :
2006

Abstract

Farsi, also known as Persian, is the official language of Iran, Tajikistan and one of the two main languages spoken in Afghanistan. It is an Indo-European agglutinating language, written in Arabic script. This paper presents the first step in creating Farsi basic language resources kit. This Step comprises the specifications for morphosyntactic encoding, which is based on the EAGLES/MULTEXT model and specific resources of MULTEXT-East. This paper introduces the language i.e. Farsi, with an emphasis on its writing system and morphological properties, and its specifications. Two other important issues introduced in this paper are; one, a novel Part of Speech (PoS) categorization and, the other, a unified orthography of Farsi in digital environment. A lexicon and an annotated corpus are under preparation. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540373346
Database :
Complementary Index
Journal :
Advances in Natural Language Processing
Publication Type :
Book
Accession number :
32883594
Full Text :
https://doi.org/10.1007/11816508_54