Back to Search Start Over

Amharic Text Chunker using Conditional Random Fields

Authors :
Birchiko Achamyeleh
Gebeyehu Belay
Birhan Hailu
Source :
CESS (Journal of Computer Engineering, System and Science); Vol 7, No 1 (2022): January 2022; 23-30
Publication Year :
2021
Publisher :
Foundation of Computer Science, 2021.

Abstract

This paper introduces Amharic text chunker using conditional random fields. To get the optimal feature set of the chunker; the researchers’ conduct different experiments using different scenarios until a promising result obtained. In this study different sentences are collected from Amharic grammar books, new articles, magazines and news of Walta Information Center (WIC) for the training and testing datasets. Thus, these datasets were analyzed and tagged manually and used as a corpus for our model training and testing. The entire datasets were chunk tagged manually for the training dataset and approved by linguistic professionals. For the identification of the boundary of the phrases IOB2 chunk specification is selected and used in this study. The result of all experiments is reported with the maximum overall accuracy off 97.26%, with a window size of two on both sides, with their corresponding POS tag of each token and the worst performance achieved is 84.57%, with only the window size of one word on both the left and right sides.

Details

ISSN :
09758887, 2502714X, and 25027131
Volume :
183
Database :
OpenAIRE
Journal :
International Journal of Computer Applications
Accession number :
edsair.doi.dedup.....691a3832d79093527308304c84b266fd
Full Text :
https://doi.org/10.5120/ijca2021921694