Start Over

Targeted Training Data Extraction—Neighborhood Comparison-Based Membership Inference Attacks in Large Language Models

Authors :: Huan Xu
Zhanhao Zhang
Xiaodong Yu
Yingbo Wu
Zhiyong Zha
Bo Xu
Wenfeng Xu
Menglan Hu
Kai Peng
Source :: Applied Sciences, Vol 14, Iss 16, p 7118 (2024)
Publication Year :: 2024
Publisher :: MDPI AG, 2024.
Abstract: A large language model refers to a deep learning model characterized by extensive parameters and pretraining on a large-scale corpus, utilized for processing natural language text and generating high-quality text output. The increasing deployment of large language models has brought significant attention to their associated privacy and security issues. Recent experiments have demonstrated that training data can be extracted from these models due to their memory effect. Initially, research on large language model training data extraction focused primarily on non-targeted methods. However, following the introduction of targeted training data extraction by Carlini et al., prefix-based extraction methods to generate suffixes have garnered considerable interest, although current extraction precision remains low. This paper focuses on the targeted extraction of training data, employing various methods to enhance the precision and speed of the extraction process. Building on the work of Yu et al., we conduct a comprehensive analysis of the impact of different suffix generation methods on the precision of suffix generation. Additionally, we examine the quality and diversity of text generated by various suffix generation strategies. The study also applies membership inference attacks based on neighborhood comparison to the extraction of training data in large language models, conducting thorough evaluations and comparisons. The effectiveness of membership inference attacks in extracting training data from large language models is assessed, and the performance of different membership inference attacks is compared. Hyperparameter tuning is performed on multiple parameters to enhance the extraction of training data. Experimental results indicate that the proposed method significantly improves extraction precision compared to previous approaches.

Subjects :: membership inference attack
training data extraction
artificial intelligence security
large language models
privacy protection
Technology
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999

Details

Language :: English
ISSN :: 20763417
Volume :: 14
Issue :: 16
Database :: Directory of Open Access Journals
Journal :: Applied Sciences
Publication Type :: Academic Journal
Accession number :: edsdoj.651395463cc4ed9b5eb0316f6449f9b
Document Type :: article
Full Text :: https://doi.org/10.3390/app14167118

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Targeted Training Data Extraction—Neighborhood Comparison-Based Membership Inference Attacks in Large Language Models

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Targeted Training Data Extraction—Neighborhood Comparison-Based Membership Inference Attacks in Large Language Models

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources