Back to Search
Start Over
Isarn Dharma word segmentation.
- Source :
- 2013 International Conference on Control, Automation & Information Sciences (ICCAIS); 2013, p53-57, 5p
- Publication Year :
- 2013
-
Abstract
- This paper presents Isarn Dhama word segmentation based on the Isarn Dharma writing system and dictionary. In this study, input text is segmented into sequences of Isarn Dharma Character Clusters (IDCCs). Each IDCC represents a group of inseparable Isarn Dharma characters based on the Isarn Dharma writing system. The sequence of IDCCs will be considered as input in order to look for the most suitable segmentation word from the dictionary using the IDCC longest matching algorithm. Grouping rules were then used to group adjacent remaining IDCCs that do not match an Isarn word in the dictionary. In order to evaluate the efficiency of the proposed technique, Isarn literature, Jataka, legend and Buddha foretell were used as the testing data to test the proposed system; comparing with longest matching and a hybrid of the IDCC longest matching. The experiment results showed that the F-measures are 80.15%, 85.06% and 86.07% for the longest matching, the IDCC longest matching algorithm, and the proposed method, respectively. [ABSTRACT FROM PUBLISHER]
Details
- Language :
- English
- ISBNs :
- 9781479905690
- Database :
- Complementary Index
- Journal :
- 2013 International Conference on Control, Automation & Information Sciences (ICCAIS)
- Publication Type :
- Conference
- Accession number :
- 94521175
- Full Text :
- https://doi.org/10.1109/ICCAIS.2013.6720529