Back to Search Start Over

Rewriting Meaningful Sentences via Conditional BERT Sampling and an application on fooling text classifiers

Authors :
Xu, Lei
Ramirez, Ivan
Veeramachaneni, Kalyan
Publication Year :
2020

Abstract

Most adversarial attack methods that are designed to deceive a text classifier change the text classifier's prediction by modifying a few words or characters. Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting. In this paper, we explore the problem of creating adversarial examples with sentence-level rewriting. We design a new sampling method, named ParaphraseSampler, to efficiently rewrite the original sentence in multiple ways. Then we propose a new criteria for modification, called a sentence-level threaten model. This criteria allows for both word- and sentence-level changes, and can be adjusted independently in two dimensions: semantic similarity and grammatical quality. Experimental results show that many of these rewritten sentences are misclassified by the classifier. On all 6 datasets, our ParaphraseSampler achieves a better attack success rate than our baseline.<br />Comment: Please see an updated version of this paper at arXiv:2104.08453

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2010.11869
Document Type :
Working Paper