Back to Search Start Over

Isarn Dharma word segmentation.

Authors :
Somsap, Sittichai
Seresangtakul, Pusadee
Source :
2013 International Conference on Control, Automation & Information Sciences (ICCAIS); 2013, p53-57, 5p
Publication Year :
2013

Abstract

This paper presents Isarn Dhama word segmentation based on the Isarn Dharma writing system and dictionary. In this study, input text is segmented into sequences of Isarn Dharma Character Clusters (IDCCs). Each IDCC represents a group of inseparable Isarn Dharma characters based on the Isarn Dharma writing system. The sequence of IDCCs will be considered as input in order to look for the most suitable segmentation word from the dictionary using the IDCC longest matching algorithm. Grouping rules were then used to group adjacent remaining IDCCs that do not match an Isarn word in the dictionary. In order to evaluate the efficiency of the proposed technique, Isarn literature, Jataka, legend and Buddha foretell were used as the testing data to test the proposed system; comparing with longest matching and a hybrid of the IDCC longest matching. The experiment results showed that the F-measures are 80.15%, 85.06% and 86.07% for the longest matching, the IDCC longest matching algorithm, and the proposed method, respectively. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISBNs :
9781479905690
Database :
Complementary Index
Journal :
2013 International Conference on Control, Automation & Information Sciences (ICCAIS)
Publication Type :
Conference
Accession number :
94521175
Full Text :
https://doi.org/10.1109/ICCAIS.2013.6720529