Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges

Authors :: Ding, Bosheng
Qin, Chengwei
Zhao, Ruochen
Luo, Tianze
Li, Xinze
Chen, Guizhen
Xia, Wenhan
Hu, Junjie
Luu, Anh Tuan
Joty, Shafiq
Ding, Bosheng
Qin, Chengwei
Zhao, Ruochen
Luo, Tianze
Li, Xinze
Chen, Guizhen
Xia, Wenhan
Hu, Junjie
Luu, Anh Tuan
Joty, Shafiq
Publication Year :: 2024
Abstract: In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of Large Language Models (LLMs) on DA, particularly addressing the unique challenges and opportunities they present in the context of natural language processing (NLP) and beyond. From a data perspective and a learning perspective, we examine various strategies that utilize Large Language Models for data augmentation, including a novel exploration of learning paradigms where LLM-generated data is used for further training. Additionally, this paper delineates the primary challenges faced in this domain, ranging from controllable data augmentation to multi modal data augmentation. This survey highlights the paradigm shift introduced by LLMs in DA, aims to serve as a foundational guide for researchers and practitioners in this field.

Tools