Back to Search
Start Over
Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models
- Source :
- Applied Sciences, Vol 12, Iss 9, p 4522 (2022)
- Publication Year :
- 2022
- Publisher :
- MDPI AG, 2022.
-
Abstract
- Recently, transformer-based pretrained language models have demonstrated stellar performance in natural language understanding (NLU) tasks. For example, bidirectional encoder representations from transformers (BERT) have achieved outstanding performance through masked self-supervised pretraining and transformer-based modeling. However, the original BERT may only be effective for English-based NLU tasks, whereas its effectiveness for other languages such as Korean is limited. Thus, the applicability of BERT-based language models pretrained in languages other than English to NLU tasks based on those languages must be investigated. In this study, we comparatively evaluated seven BERT-based pretrained language models and their expected applicability to Korean NLU tasks. We used the climate technology dataset, which is a Korean-based large text classification dataset, in research proposals involving 45 classes. We found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification. This suggests the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.
- Subjects :
- natural language understanding
multiclass text classification
bidirectional encoder representations from transformers
transfer learning
multilingual representation learning
cross-lingual representation learning
Technology
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Subjects
Details
- Language :
- English
- ISSN :
- 20763417
- Volume :
- 12
- Issue :
- 9
- Database :
- Directory of Open Access Journals
- Journal :
- Applied Sciences
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.07beac94318b4233b4412182c7c0ff25
- Document Type :
- article
- Full Text :
- https://doi.org/10.3390/app12094522