Back to Search
Start Over
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
- Publication Year :
- 2024
-
Abstract
- Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2407.02987
- Document Type :
- Working Paper