Back to Search
Start Over
MVTamperBench: Evaluating Robustness of Vision-Language Models
- Publication Year :
- 2024
-
Abstract
- Multimodal Large Language Models (MLLMs) have driven major advances in video understanding, yet their vulnerability to adversarial tampering and manipulations remains underexplored. To address this gap, we introduce MVTamperBench, a benchmark that systematically evaluates MLLM robustness against five prevalent tampering techniques: rotation, masking, substitution, repetition, and dropping. Built from 3.4K original videos-expanded to over 17K tampered clips spanning 19 video tasks. MVTamperBench challenges models to detect manipulations in spatial and temporal coherence. We evaluate 45 recent MLLMs from 15+ model families, revealing substantial variability in resilience across tampering types and showing that larger parameter counts do not necessarily guarantee robustness. MVTamperBench sets a new benchmark for developing tamper-resilient MLLM in safety-critical applications, including detecting clickbait, preventing harmful content distribution, and enforcing policies on media platforms. We release all code and data to foster open research in trustworthy video understanding. Code: https://amitbcp.github.io/MVTamperBench/ Data: https://huggingface.co/datasets/Srikant86/MVTamperBench
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2412.19794
- Document Type :
- Working Paper