An Application Programming Interface (API) Sensitive Data Identification Method Based on the Federated Large Language Model.

Authors :: Wu, Jianping
Chen, Lifeng
Fang, Siyuan
Wu, Chunming
Source :: Applied Sciences (2076-3417); Nov2024, Vol. 14 Issue 22, p10162, 16p
Publication Year :: 2024
Abstract: The traditional methods for identifying sensitive data in APIs mainly encompass rule-based and machine learning-based approaches. However, these methods suffer from inadequacies in terms of security and robustness, exhibit high false positive rates, and struggle to cope with evolving threat landscapes. This paper proposes a method for detecting sensitive data in APIs based on the Federated Large Language Model (FedAPILLM). This method applies the large language model Qwen2.5 and the LoRA instruction tuning technique within the framework of federated learning (FL) to the field of data security. Under the premise of protecting data privacy, a domain-specific corpus and knowledge base are constructed for pre-training and fine-tuning, resulting in a large language model specifically designed for identifying sensitive data in APIs. This paper conducts comparative experiments involving Llama3 8B, Llama3.1 8B, and Qwen2.5 14B. The results demonstrate that Qwen2.5 14B can achieve similar or better performance levels compared to the Llama3.1 8B model with fewer training iterations. [ABSTRACT FROM AUTHOR]

Subjects :: LANGUAGE models
FEDERATED learning
DATA privacy
DATA security
DATA modeling

Full Text Access

Tools