Back to Search
Start Over
BaNeL: an encoder-decoder based Bangla neural lemmatizer
- Source :
- SN Applied Sciences, Vol 4, Iss 5, Pp 1-15 (2022)
- Publication Year :
- 2022
- Publisher :
- Springer, 2022.
-
Abstract
- Abstract This study presents an efficient framework of deriving lemma from an inflected Bangla word considering its parts-of-speech as context. Bangla is a morphologically rich Indo-Aryan language where around 70% words are inflected, and some words have around 90 different inflected forms making it one of the most challenging languages for lemmatization. The unavailability of a sufficiently large appropriate dataset in Bangla makes the task even more strenuous. A reliable robust Bangla lemmatizer will create new possibilities for other dependent fields like automatic language translation and grammatical correction to flourish in Bangla. In this paper, we have described a new larger Bangla dataset for lemmatization and an encoder-decoder-based sequence_to_sequence framework for it. After tuning the hyper-parameters, the proposed framework yielded 95.75% character accuracy and 91.81% exact match on the testing split of the prepared dataset which is significantly higher than existing other approaches in Bangla for lemmatization. Article Highlights This article: Discusses lemmatization task in Bangla and demonstrates difference with stemming Presents an artificial neural network based efficient model for lemmatization that yields comparatively better performance than existing ones Describes a new large dataset for lemmatization in Bangla language
Details
- Language :
- English
- ISSN :
- 25233963 and 25233971
- Volume :
- 4
- Issue :
- 5
- Database :
- Directory of Open Access Journals
- Journal :
- SN Applied Sciences
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.88511ea89b3b4dc69e9785a36cef10f1
- Document Type :
- article
- Full Text :
- https://doi.org/10.1007/s42452-022-04985-2