Back to Search Start Over

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Authors :
Dhole, Kaustubh D.
Gangal, Varun
Gehrmann, Sebastian
Gupta, Aadesh
Li, Zhenhao
Mahamood, Saad
Mahendiran, Abinaya
Mille, Simon
Shrivastava, Ashish
Tan, Samson
Wu, Tongshuang
Sohl-Dickstein, Jascha
Choi, Jinho D.
Hovy, Eduard
Dusek, Ondrej
Ruder, Sebastian
Anand, Sajant
Aneja, Nagender
Banjade, Rabin
Barthe, Lisa
Behnke, Hanna
Berlot-Attwell, Ian
Boyle, Connor
Brun, Caroline
Cabezudo, Marco Antonio Sobrevilla
Cahyawijaya, Samuel
Chapuis, Emile
Che, Wanxiang
Choudhary, Mukund
Clauss, Christian
Colombo, Pierre
Cornell, Filip
Dagan, Gautier
Das, Mayukh
Dixit, Tanay
Dopierre, Thomas
Dray, Paul-Alexis
Dubey, Suchitra
Ekeinhor, Tatiana
Di Giovanni, Marco
Goyal, Tanya
Gupta, Rishabh
Hamla, Louanes
Han, Sang
Harel-Canada, Fabrice
Honore, Antoine
Jindal, Ishan
Joniak, Przemyslaw K.
Kleyko, Denis
Kovatchev, Venelin
Krishna, Kalpesh
Kumar, Ashutosh
Langer, Stefan
Lee, Seungjae Ryan
Levinson, Corey James
Liang, Hualou
Liang, Kaizhao
Liu, Zhexiong
Lukyanenko, Andrey
Marivate, Vukosi
de Melo, Gerard
Meoni, Simon
Meyer, Maxime
Mir, Afnan
Moosavi, Nafise Sadat
Muennighoff, Niklas
Mun, Timothy Sum Hon
Murray, Kenton
Namysl, Marcin
Obedkova, Maria
Oli, Priti
Pasricha, Nivranshu
Pfister, Jan
Plant, Richard
Prabhu, Vinay
Pais, Vasile
Qin, Libo
Raji, Shahab
Rajpoot, Pawan Kumar
Raunak, Vikas
Rinberg, Roy
Roberts, Nicolas
Rodriguez, Juan Diego
Roux, Claude
S., Vasconcellos P. H.
Sai, Ananya B.
Schmidt, Robin M.
Scialom, Thomas
Sefara, Tshephisho
Shamsi, Saqib N.
Shen, Xudong
Shi, Haoyue
Shi, Yiwen
Shvets, Anna
Siegel, Nick
Sileo, Damien
Simon, Jamie
Singh, Chandan
Sitelew, Roman
Soni, Priyank
Sorensen, Taylor
Soto, William
Srivastava, Aman
Srivatsa, KV Aditya
Sun, Tony
T, Mukund Varma
Tabassum, A
Tan, Fiona Anting
Teehan, Ryan
Tiwari, Mo
Tolkiehn, Marie
Wang, Athena
Wang, Zijian
Wang, Gloria
Wang, Zijie J.
Wei, Fuxuan
Wilie, Bryan
Winata, Genta Indra
Wu, Xinyi
Wydmański, Witold
Xie, Tianbao
Yaseen, Usama
Yee, Michael A.
Zhang, Jing
Zhang, Yue
Publication Year :
2021

Abstract

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).<br />39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....5686d761c2fcfd73ec8183a98c7c1803