Back to Search Start Over

An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification

Authors :
Teddy Mantoro
Opim Salim Sitompul
Amalia Amalia
Erna Budhiarti Nababan
Source :
2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA).
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

Text classification using a simple word representation with a linear classifier often considered as strong baselines to gain the best performances. However, a simple word representation like Bag of Word (BOW) has a deficiency of curse dimensionality, so it is only suitable for small datasets. BOW also needs some dependent pre-processing steps like stopwords-removal and stemming. Therefore, the BOW model cannot be implemented automatically because of the dependency in a specific language. On the other hand, deep neural network classifiers can eliminate the pre-processing prerequisite, but this model not efficient in time processing and need a large dataset for the learning process. It becomes a challenge for language that has limitation resources like Bahasa Indonesia. Another novel approach of text classifier is using the fastText model for text classification. This model can minimize pre-processing dependencies and more efficient in training time processing. However, there hasn't been much observation whether the fastText model outperformed the BOW model for small datasets. This paper aims to compare text classification using the TFIDF model as one of the BOW models with a fastText model for 500 news articles in Bahasa Indonesia. The result of this study showed both models gain an outstanding performance, which is 0.97 F-Score. The TFIDF model needs longer pre-processing stages and requiring more training time. Meanwhile, the fastText model only needs to tune some hyperparameters and get similar performance results to the TFIDF model. Based on this study, we can conclude that the fastText model is efficient text classification.

Details

Database :
OpenAIRE
Journal :
2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)
Accession number :
edsair.doi...........7e80645f474cf2cb338292e7961959f5
Full Text :
https://doi.org/10.1109/databia50434.2020.9190447