Start Over

Generating natural adversarial examples with universal perturbations for text classification.

Authors :: Gao, Haoran
Zhang, Hua
Yang, Xingguo
Li, Wenmin
Gao, Fei
Wen, Qiaoyan
Source :: Neurocomputing. Jan2022, Vol. 471, p175-182. 8p.
Publication Year :: 2022
Abstract: Recent works have demonstrated the vulnerability of text classifiers to universal adversarial attacks, which are splicing carefully designed word sequences into the original text. These word sequences are natural, and adversarial examples generated by splicing them with the original text are unnatural. In this paper, we propose a framework for generating natural adversarial examples with an adversarially regularized autoencoder (ARAE) model and an inverter model. The framework maps discrete text into the continuous space, get the conversion of adversarial examples by adding universal adversarial perturbations in the continuous space, then generates natural adversarial examples. In order to achieve universal adversarial attacks, we design a universal adversarial perturbations search (UAPS) algorithm with the gradient of the loss function of the target classifier. Perturbations found by the UAPS algorithm can be directly added to the conversion of the original text in the continuous space. On two textual entailment datasets, we evaluate the fooling rate of generated adversarial examples on two RNN-based architectures and one Transformer-based architecture. The results show that all architectures are vulnerable to the adversarial examples. For example, on the SNLI dataset, the accuracy of the ESIM model for the "entailment" category drops from 88.35% to 2.26%. While achieving a high fooling rate, generated adversarial examples have good performance in naturalness. By further analysis, adversarial examples generated in this paper have transferability in neural networks. [ABSTRACT FROM AUTHOR]

Details

Language :: English
ISSN :: 09252312
Volume :: 471
Database :: Academic Search Index
Journal :: Neurocomputing
Publication Type :: Academic Journal
Accession number :: 154011712
Full Text :: https://doi.org/10.1016/j.neucom.2021.10.089

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Generating natural adversarial examples with universal perturbations for text classification.

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Generating natural adversarial examples with universal perturbations for text classification.

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources