Exploring Character-Level Deep Learning Models for POS Tagging in Assamese Language.

Authors :: Phukan, Rituraj
Baruah, Nomi
Sarma, Shikhar Kr.
Konwar, Darpanjit
Source :: Procedia Computer Science; 2024, Vol. 235, p1467-1476, 10p
Publication Year :: 2024
Abstract: The proposed research investigates a novel approach of character-level Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (Bi-LSTM) for part-of-speech (POS) tagging in the Assamese language. The proposed work contributes to Natural Language Processing (NLP) by exploring these models' ability to assign grammatical labels (POS tags) to individual words within Assamese sentences. The corpus encompasses 60,000 Assamese words and utilizes the LDCIL Assamese tagset. For training and testing, the corpus is divided into an 80:20 ratio where 80% of the corpus is used for training the models, and the rest of 20% is used for evaluation. The character-level LSTM model achieves an accuracy of 92.80%, while the character-level Bi-LSTM model surpasses it by achieving an accuracy of 93.36%. The performance of the proposed research outperforms the existing research works in the Assamese language. The results of this work broaden the understanding of POS tagging in Assamese, offering valuable findings that could be applied to other languages with similar characteristics [ABSTRACT FROM AUTHOR]

Subjects :: NATURAL language processing
DEEP learning
LANGUAGE research
LANGUAGE & languages

Full Text Access

Tools