Back to Search
Start Over
A Corpus-Based Approach for Automatic Thai Unknown Word Recognition using Ensemble Learning Techniques.
- Source :
- Advances in Knowledge Discovery & Data Mining: 13th Pacific-Aasia Conference, Pakdd 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings; 2009, p533-540, 8p
- Publication Year :
- 2009
-
Abstract
- This paper presents a corpus-based approach for automatic unknown word recognition in Thai. This approach applies an ensemble learning technique to generate a model for classifying unknown word candidates using features obtained from a corpus. We propose a technique called ˵group-based evaluation by ranking″. It clusters the unknown word candidates into groups based on the occuring locations. The candidate with the highest accuracy is then identified as an unknown word. In this task, the number of positive instances is dominantly smaller than that of negative instances, forming an unbalanced data set. To improve the prediction accuracy, we apply a boosting technique with ˵voting under group-based evaluation by ranking″. We have conducted experiments on real-world data to evaluate the performance of the proposed approach. The experiments compared the accuracy of our technique with an ordinary naïve Bayes technique. Our technique achieves the accuracy 90.93±0.50% when the first rank is selected and 97.90±0.26% when the candidates up to the tenth rank are considered. This is 6.79% to 8.45% improvement. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISBNs :
- 9783642013065
- Database :
- Complementary Index
- Journal :
- Advances in Knowledge Discovery & Data Mining: 13th Pacific-Aasia Conference, Pakdd 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings
- Publication Type :
- Book
- Accession number :
- 76836350
- Full Text :
- https://doi.org/10.1007/978-3-642-01307-2_50