Back to Search Start Over

NLPSweep: A comprehensive defense scheme for mitigating NLP backdoor attacks.

Authors :
Xiang, Tao
Ouyang, Fei
Zhang, Di
Xie, Chunlong
Wang, Hao
Source :
Information Sciences. Mar2024, Vol. 661, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Natural language processing (NLP) backdoor attacks have become a hidden threat to modern NLP applications. Most of the existing defense methods defend against specific types of backdoor attacks, and they generally fail to defend against invisible backdoors with syntactically correct triggers. This paper proposes NLPSweep, a comprehensive defense scheme that can defend against five common types of backdoor attacks, namely, character, word, sentence, homograph, and learnable textual attacks. Specifically, we propose a framework that can discover an effective defense solution without prior knowledge of the attacks. The defense solution is optimized from the framework and can defend against various attacks while ensuring high accuracy. Finally, we verify the effectiveness of NLPSweep on two pretrained models (BERT and XLNET) on three classic datasets (SST-2, IMDB, and OLID) and compare it with five state-of-the-art defense methods, namely, ONION, Pred, RAP, Fine-pruning, and STRIP. The experimental results demonstrate that NLPSweep has an average model accuracy (ACC) greater than 0.922 and that the average attack success rate (ASR) is only 0.202, outperforming the compared methods. Furthermore, NLPSweep is tested on the real-world Yelp dataset and it can effectively defend against backdoor attacks with the ASR less than 0.07 and the ACC greater than 0.973.1 • Existing defense methods only defend against specific backdoor attacks. • Our defense framework can discover an effective defense solution without prior knowledge of the attacks. • Our defense solution can defend against various attacks while ensuring high accuracy. • Tests on three classic datasets show that our defense scheme outperforms recent work. • Our scheme demonstrates operational success on the real-world Yelp dataset, effectively defending against backdoor attacks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
661
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
175279543
Full Text :
https://doi.org/10.1016/j.ins.2024.120176