Start Over

ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling

Authors :: Alcoforado, Alexandre
Ferraz, Thomas Palmeira
Gerber, Rodrigo
Bustos, Enzo
Oliveira, André Seidel
Veloso, Bruno Miguel
Siqueira, Fabio Levy
Costa, Anna Helena Reali
Source :: In: Pinheiro V. et al. (eds) Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science, vol 13208. Springer, Cham
Publication Year :: 2022
Abstract: Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset. Keywords: Low-Resource NLP, Unlabeled data, Zero-Shot Learning, Topic Modeling, Transformers.<br />Comment: Accepted at PROPOR 2022: 15th International Conference on Computational Processing of Portuguese

Subjects :: Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Details

Database :: arXiv
Journal :: In: Pinheiro V. et al. (eds) Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science, vol 13208. Springer, Cham
Publication Type :: Report
Accession number :: edsarx.2201.01337
Document Type :: Working Paper
Full Text :: https://doi.org/10.1007/978-3-030-98305-5_12