Back to Search
Start Over
Text Data Augmentation for the Korean Language.
- Source :
- Applied Sciences (2076-3417); Apr2022, Vol. 12 Issue 7, p3425-3425, 10p
- Publication Year :
- 2022
-
Abstract
- Data augmentation (DA) is a universal technique to reduce overfitting and improve the robustness of machine learning models by increasing the quantity and variety of the training dataset. Although data augmentation is essential in vision tasks, it is rarely applied to text datasets since it is less straightforward. Some studies have concerned text data augmentation, but most of them are for the majority languages, such as English or French. There have been only a few studies on data augmentation for minority languages, e.g., Korean. This study fills the gap by demonstrating several common data augmentation methods and Korean corpora with pre-trained language models. In short, we evaluate the performance of two text data augmentation approaches, known as text transformation and back translation. We compare these augmentations among Korean corpora on four downstream tasks: semantic textual similarity (STS), natural language inference (NLI), question duplication verification (QDV), and sentiment classification (STC). Compared to cases without augmentation, the performance gains when applying text data augmentation are 2.24%, 2.19%, 0.66%, and 0.08% on the STS, NLI, QDV, and STC tasks, respectively. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 20763417
- Volume :
- 12
- Issue :
- 7
- Database :
- Complementary Index
- Journal :
- Applied Sciences (2076-3417)
- Publication Type :
- Academic Journal
- Accession number :
- 156248929
- Full Text :
- https://doi.org/10.3390/app12073425