Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan, jun lan, Zichang Tan, Senyuan Shi, Ajian Liu, Chuanbiao Song, Huijia Zhu, Weiqiang Wang, Jun Wan, Zhen Lei

LLMs & Reasoning Thu, Apr 23 · 3:27 PM–3:37 PM · 202 A/B Avg rating: 6.50 (4–8)

Author-provided TL;DR

We introduce a MLLM-based detector for transparent deepfake detection, along with a holistic dataset for deepfake detection.

Abstract

Deepfake detection remains a formidable challenge due to the evolving nature of fake content in real-world scenarios. However, existing benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical usage of current detectors. To mitigate this gap, we introduce **HydraFake**, a dataset that contains diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose **Veritas**, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce *pattern-aware reasoning* that involves critical patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different out-of-domain (OOD) scenarios, and is capable of delivering transparent and faithful detection outputs.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Veritas deepfake detector uses pattern-aware reasoning via MLLMs to achieve superior generalization across unseen forgery techniques and data domains.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

HydraFake dataset with diversified deepfake techniques and in-the-wild forgeries for comprehensive evaluation
Veritas MLLM-based deepfake detector with pattern-aware reasoning
Two-stage training pipeline for internalizing deepfake reasoning capacities

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Multimodal LLMs
Chain-of-thought reasoning
Pattern-aware reasoning
Deepfake detection

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

HydraFake

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

Deepfake Detection
MLLMs

Something off? Let us know →

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis