Back to Search Start Over

A Character String-Based Stemming for Morphologically Derivative Languages.

Authors :
Imin, Gvzelnur
Ablimit, Mijit
Yilahun, Hankiz
Hamdulla, Askar
Source :
Information (2078-2489); Apr2022, Vol. 13 Issue 4, p170-170, 16p
Publication Year :
2022

Abstract

Morphologically derivative languages form words by fusing stems and suffixes, stems are important to be extracted in order to make cross lingual alignment and knowledge transfer. As there are phonetic harmony and disharmony when linguistic particles are combined, both phonetic and morphological changes need to be analyzed. This paper proposes a multilingual stemming method that learns morpho-phonetic changes automatically based on character based embedding and sequential modeling. Firstly, the character feature embedding at the sentence level is used as input, and the BiLSTM model is used to obtain the forward and reverse context sequence, and the attention mechanism is added to this model for weight learning, and the global feature information is extracted to capture the stem and affix boundaries; finally CRF model is used to learn more information from sequence features to describe context information more effectively. In order to verify the effectiveness of the above model, the model in this paper is compared with the traditional model on two different data sets of three derivative languages: Uyghur, Kazakh and Kirghiz. The experimental results show that the model in this paper has the best stemming effect on multilingual sentence-level datasets, which leads to more effective stemming. In addition, the proposed model outperforms other traditional models, and fully consider the data characteristics, and has certain advantages with less human intervention. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20782489
Volume :
13
Issue :
4
Database :
Complementary Index
Journal :
Information (2078-2489)
Publication Type :
Academic Journal
Accession number :
156530999
Full Text :
https://doi.org/10.3390/info13040170