1. Detection and Defense: Student-Teacher Network for Adversarial Robustness
- Author
-
Kyoungchan Park and Pilsung Kang
- Subjects
Adversarial attack ,adversarial detection ,adversarial defense ,student-teacher network ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Defense against adversarial attacks is critical for the reliability and safety of deep neural networks (DNNs). Current state-of-the-art defense methods achieve significant robustness against adversarial attacks. However, such defense methods cannot distinguish between adversarial examples (AEs) and normal examples (NEs). Thus, they apply the same defense process for both examples to perform classification, resulting in performance degradation for NEs. In this paper, we propose a novel defense method based on the student-teacher framework that can minimize the classification performance degradation for NEs by detecting AEs and then applying the defense process only to AEs. Focusing on the fact that distortion in the hidden layer features is inevitable for the success of adversarial attacks, we train the student network to predict the undistorted hidden layer features of the teacher network (target DNN). Therefore, our method can detect AEs through the difference in the hidden layer features between the student and teacher network, and then recover the classification result of AEs using the penultimate layer features predicted by the student network. Through extensive experiments on representative image classification benchmark datasets, i.e., CIFAR-10, CIFAR-100, and TinyImagenet, we demonstrate the superiority of our method in both detection and defense compared with state-of-the-art methods. Furthermore, we show that our method achieves robust detection and defense performance for a fully white-box attack that assumes an attacker knows the information of our entire detection and defense mechanism.
- Published
- 2024
- Full Text
- View/download PDF