Start Over

Method for testing NLP models with text adversarial examples

Authors :: Artem B. Menisov
Aleksandr G. Lomako
Timur R. Sabirov
Source :: Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki, Vol 23, Iss 5, Pp 946-954 (2023)
Publication Year :: 2023
Publisher :: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University), 2023.
Abstract: At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.

Subjects :: artificial intelligence
natural language processing
information security
adversarial attacks
security testing
Optics. Light
QC350-467
Electronic computers. Computer science
QA75.5-76.95

Details

Language :: English, Russian
ISSN :: 22261494 and 25000373
Volume :: 23
Issue :: 5
Database :: Directory of Open Access Journals
Journal :: Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Publication Type :: Academic Journal
Accession number :: edsdoj.f3b8acb95c6f4962801975bd0bcdd26a
Document Type :: article
Full Text :: https://doi.org/10.17586/2226-1494-2023-23-5-946-954