1. Deepfake Detection Method Integrating Multiple Parameter-Efficient Fine-Tuning Techniques
- Author
-
ZHANG Yiwen, CAI Manchun, CHEN Yonghao, ZHU Yi, YAO Lifeng
- Subjects
deepfakes ,vision transformer ,self-supervised pretrained models ,low-rank adaptation (lora) ,parameter-efficient fine-tuning ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
In recent years, as deepfake technology matures, face-swapping software and synthesized videos have become widespread. While these techniques offer entertainment, they also provide opportunities for misuse by malicious actors. Consequently, the significance of deepfake detection technology has grown markedly. Existing methods for deepfake detection commonly suffer from issues including poor cross-compression robustness, weak cross-dataset generalization, and high model training overheads. To address these challenges, this paper proposes a deepfake detection approach that combines multiple parameter-efficient fine-tuning techniques. This method utilizes a visual Transformer model pretrained with the masked image modeling self-supervised method as its backbone. Initially, it employs the low-rank adaptation (LoRA) method for fine-tuning the self-attention module parameters of the pretrained model. Concurrently, it introduces a parallel structure incorporating convolutional adapters to capture local texture information, enhancing the model’s adaptability in deepfake detection tasks. Subsequently, a serial structure introduces classical adapters to fine-tune the feed-forward network of the pretrained model, maximizing the utilization of knowledge acquired during the pretraining phase. Ultimately, a multi-layer perceptron replaces the original pretrained model’s classification head for deepfake detection. Experimental results across six datasets demonstrate that this model achieves an average frame-level AUC of approximately 0.996 with only 2×107 trainable parameters. In cross-compression experiments, the average frame-level AUC drop is 0.135. In cross-dataset generalization experiments, the frame-level AUC averages around 0.765.
- Published
- 2024
- Full Text
- View/download PDF