Back to Search Start Over

Pashto Language Stemming Algorithm

Authors :
Sebghatullah Aslamzai
Saidah Saad
Source :
Asia-Pacific Journal of Information Technology and Multimedia, Vol 4, Iss (1), Pp 25-37 (2015)
Publication Year :
2015
Publisher :
UKM Press, 2015.

Abstract

In this paper a stemming algorithm for morphological analysis for less popular or minor language like Pashto language is presented. Pashto, as a less popular language, lacks the resources and tools that can be applied in different applications such as in document indexing, clustering, language processing, text analysis, database search systems, information retrieval, linguistic applications, and so forth. The review of literature shows that only very few morphological studies have been conducted on Pashto language and Pashto has not yet been fully analyzed. In addition, no stemming algorithm has been proposed in extracting Pashto root words from the Pashto corpus. In this paper, Pashto corpus is directly used as the input, and accordingly the stemming algorithm uses both inflectional and derivational morphemes. The output is in the form of meaningful root word without affixes. Furthermore, the accuracy and strength of the proposed algorithm is evaluated. To validate the function of the developed algorithm, two native speakers of Pashto were recruited to evaluate the algorithm in terms of its accuracy and strength.

Details

Language :
English, Malay
ISSN :
22892192
Volume :
4
Issue :
(1)
Database :
Directory of Open Access Journals
Journal :
Asia-Pacific Journal of Information Technology and Multimedia
Publication Type :
Academic Journal
Accession number :
edsdoj.636ee3aeef344d4fbd8ad28bfaed2c1e
Document Type :
article
Full Text :
https://doi.org/10.17576/apjitm-2015-0401-03