Back to Search Start Over

A Corpus-Based Approach for Automatic Thai Unknown Word Recognition using Ensemble Learning Techniques.

Authors :
TeCho, Jakkrit
Nattee, Cholwich
Theeramunkong, Thanaruk
Source :
Advances in Knowledge Discovery & Data Mining: 13th Pacific-Aasia Conference, Pakdd 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings; 2009, p533-540, 8p
Publication Year :
2009

Abstract

This paper presents a corpus-based approach for automatic unknown word recognition in Thai. This approach applies an ensemble learning technique to generate a model for classifying unknown word candidates using features obtained from a corpus. We propose a technique called ˵group-based evaluation by ranking″. It clusters the unknown word candidates into groups based on the occuring locations. The candidate with the highest accuracy is then identified as an unknown word. In this task, the number of positive instances is dominantly smaller than that of negative instances, forming an unbalanced data set. To improve the prediction accuracy, we apply a boosting technique with ˵voting under group-based evaluation by ranking″. We have conducted experiments on real-world data to evaluate the performance of the proposed approach. The experiments compared the accuracy of our technique with an ordinary naïve Bayes technique. Our technique achieves the accuracy 90.93±0.50% when the first rank is selected and 97.90±0.26% when the candidates up to the tenth rank are considered. This is 6.79% to 8.45% improvement. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783642013065
Database :
Complementary Index
Journal :
Advances in Knowledge Discovery & Data Mining: 13th Pacific-Aasia Conference, Pakdd 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings
Publication Type :
Book
Accession number :
76836350
Full Text :
https://doi.org/10.1007/978-3-642-01307-2_50