1. Defending against model extraction attacks with physical unclonable function.
- Author
-
Li, Dawei, Liu, Di, Guo, Ying, Ren, Yangkun, Su, Jieyu, and Liu, Jianwei
- Subjects
- *
PHYSICAL mobility , *DEEP learning , *MACHINE learning , *PREDICTION models - Abstract
Machine learning models, especially deep neural network (DNN) models, have widespread and valuable applications in business activities. Training a deep learning model for commercial use requires plenty of private data, expert knowledge, and computing resources. The huge commercial value of such trained models has attracted the attention of attackers. Attackers can construct a dataset by repeatedly querying the target model for the output of the requested samples and then train a substitute model on this dataset that functions similarly to the target model. In this paper, we propose a defense scheme based on physical unclonable function (PUF) against such black-box model extraction attacks. We deploy a PUF on the user side and the corresponding PUF model on the service provider side to ensure that only legitimate users can obtain the correct model predictions. Our experimental results show that by choosing a suitable fuzzy extractor threshold d , legitimate users can recover more than 99.5% of the prediction results with a little additional computational overhead to the service provider. We perform a model extraction attack in the most favorable case for the attacker, and the prediction accuracy of the obtained substitute model is only about 10%, which demonstrates the effectiveness of our proposed scheme. Compared to existing defenses, our scheme not only effectively prevents black-box model extraction attacks but also ensures that the accuracy of the prediction service for legitimate users is not affected. • Defense based on PUF obfuscation against black-box model extraction attacks. • A PUF and the corresponding PUF model ensure that legitimate users can obtain correct model predictions. • Legitimate users can correctly recover more than 99.5% of the predictions with a little additional computational overhead. • The prediction accuracy of the model extracted by the attacker is only about 10%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF