Back to Search Start Over

A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis

Authors :
Klangjai Tammanam
Nuttachot Promrit
Sajjaporn Waijanya
Source :
Engineering and Applied Science Research, Vol 48, Iss 5, Pp 614-626 (2021)
Publication Year :
2021
Publisher :
Khon Kaen University, 2021.

Abstract

Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbouring words are changed and merged. Pali Sandhi word segmentation is more challenging than Thai word segmentation because Pali is a highly inflected language. This study proposes a novel approach that predicts splitting locations by classifying the sample Sandhi words into five classes with a bidirectional long short-term memory model. We applied the classified rules to rectify the words from the splitting locations. We identified 6,345 Pali Sandhi words from Dhammapada Atthakatha. We evaluated the performance of our proposed model on the basis of the accuracy of the splitting locations and compared the results with the dataset. Results showed that 92.20% of the splitting locations were correct, 1.10% of the Pali Sandhi words were predicted as non-splitting location words and 5.83% were not matched with the answers (incomplete segmentation).

Details

Language :
English
ISSN :
25396161 and 25396218
Volume :
48
Issue :
5
Database :
Directory of Open Access Journals
Journal :
Engineering and Applied Science Research
Publication Type :
Academic Journal
Accession number :
edsdoj.4314028e98404979931b179463334755
Document Type :
article