Start Over

Evaluating the Impact of Text Data Augmentation on Text Classification Tasks using DistilBERT.

Authors :: Nair, Aarathi Rajagopalan
Singh, Rimjhim Padam
Gupta, Deepa
Kumar, Priyanka
Source :: Procedia Computer Science; 2024, Vol. 235, p102-111, 10p
Publication Year :: 2024
Abstract: Data augmentation entails artificially expanding the dataset's size by applying various transformations to the existing raw data. Enhancing the quality and quantity of the datasets with varying sizes by employing varieddata augmentation techniques has immense importance in the field on Natural Language Processing. Several notable applications for instance text classification, sentiment analysis, text summarization, etc. have proven to be benefitted immensely with the employment of text augmentation techniques. Hence, the paper focuses on efficient text classification using varied datasets of different sizes; small- 500 instances, medium-5564 instances and large-43934 instances.The work considers the standard DistilBERT model, a popular transformer-based language model and presents the impact on the performance of the modelafter employing different text augmentation techniques. The study specifically focuses on three augmentation methods: (a) Synonym augmentation:that involves replacing words with their synonyms to enhance vocabulary diversity and generalization, (b) Contextual word embeddings that enriches semantic understanding by leveraging pre-trained language models, and (c) Black translation that entails translating the text into a another different language and then translating it back, introducing variations in the data and capturing different linguistic patterns.Additionally,the work also discusses the combined effect of employing all three augmentation techniques simultaneously. Moreover, the study also aims compares the relation between the dataset sizes and the performance of the augmentation techniques. The study considers three standard datasets for the analysis and presents a comprehensive analysis using accuracy and F1 score as evaluation metrics. The results highlight the efficacy of each technique across small, medium, and large datasets, enabling a nuanced understanding of their benefits in different data scenarios. The findings indicate the varying degrees of improvement achieved through each augmentation technique.The enhancement achieved by applying text augmentation varied from around 2% on large datasets to 20% on smaller datasets. [ABSTRACT FROM AUTHOR]

Subjects :: DATA augmentation
LANGUAGE models
TEXT summarization
TRANSLATING & interpreting
SENTIMENT analysis
NATURAL language processing
MACHINE translating

Details

Language :: English
ISSN :: 18770509
Volume :: 235
Database :: Supplemental Index
Journal :: Procedia Computer Science
Publication Type :: Academic Journal
Accession number :: 177603593
Full Text :: https://doi.org/10.1016/j.procs.2024.04.013

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Evaluating the Impact of Text Data Augmentation on Text Classification Tasks using DistilBERT.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Evaluating the Impact of Text Data Augmentation on Text Classification Tasks using DistilBERT.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources