Implementation of The Indonesian Language Stemming Algorithm in Twitter Data Preprocessing. Case Study: Twitter Wargabanua and Instakalsel.

Authors :: Rizki, Afian Syafaadi
Aristi, Nina Mia
Ridha, M. Najamudin
Zulfahri, Aidil Fajar
Wibowo, Dwi Agung
Source :: Fidelity: Jurnal Teknik Elektro; Sep2023, Vol. 5 Issue 3, p175-183, 9p
Publication Year :: 2023
Abstract: Stemming is a widely used method in the field of Natural Language Processing (NLP). Its primary purpose is to normalize words with similar meanings but different forms into a common representation by converting them into their basic or root forms. Stemming is typically applied during the data preprocessing stage to enhance the performance of NLP systems. In the context of the Indonesian language, the Nazief stemming algorithm is the most commonly employed. This algorithm has been developed and adapted for various regional languages in Indonesia. In this research, we will assess the performance of the Nazief stemming algorithm on Twitter data from the accounts @wargabanua and @instakalsel. The goal is to evaluate how the algorithm handles text data that includes a mixture of two languages: Indonesian and Banjar. The test results indicate an accuracy rate of 90.34%. This demonstrates that the Nazief stemming algorithm can effectively process social media text data, even though it was not originally designed for the Banjar language. [ABSTRACT FROM AUTHOR]

Full Text Access

Tools