Back to Search Start Over

Exploring Character-Level Deep Learning Models for POS Tagging in Assamese Language.

Authors :
Phukan, Rituraj
Baruah, Nomi
Sarma, Shikhar Kr.
Konwar, Darpanjit
Source :
Procedia Computer Science; 2024, Vol. 235, p1467-1476, 10p
Publication Year :
2024

Abstract

The proposed research investigates a novel approach of character-level Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (Bi-LSTM) for part-of-speech (POS) tagging in the Assamese language. The proposed work contributes to Natural Language Processing (NLP) by exploring these models' ability to assign grammatical labels (POS tags) to individual words within Assamese sentences. The corpus encompasses 60,000 Assamese words and utilizes the LDCIL Assamese tagset. For training and testing, the corpus is divided into an 80:20 ratio where 80% of the corpus is used for training the models, and the rest of 20% is used for evaluation. The character-level LSTM model achieves an accuracy of 92.80%, while the character-level Bi-LSTM model surpasses it by achieving an accuracy of 93.36%. The performance of the proposed research outperforms the existing research works in the Assamese language. The results of this work broaden the understanding of POS tagging in Assamese, offering valuable findings that could be applied to other languages with similar characteristics [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18770509
Volume :
235
Database :
Supplemental Index
Journal :
Procedia Computer Science
Publication Type :
Academic Journal
Accession number :
177603718
Full Text :
https://doi.org/10.1016/j.procs.2024.04.138