ICLR 2026 Orals

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan, jun lan, Zichang Tan, Senyuan Shi, Ajian Liu, Chuanbiao Song, Huijia Zhu, Weiqiang Wang, Jun Wan, Zhen Lei

LLMs & Reasoning Thu, Apr 23 · 3:27 PM–3:37 PM · 202 A/B Avg rating: 6.50 (4–8)
Author-provided TL;DR

We introduce a MLLM-based detector for transparent deepfake detection, along with a holistic dataset for deepfake detection.

Abstract

Deepfake detection remains a formidable challenge due to the evolving nature of fake content in real-world scenarios. However, existing benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical usage of current detectors. To mitigate this gap, we introduce **HydraFake**, a dataset that contains diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose **Veritas**, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce *pattern-aware reasoning* that involves critical patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different out-of-domain (OOD) scenarios, and is capable of delivering transparent and faithful detection outputs.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Veritas deepfake detector uses pattern-aware reasoning via MLLMs to achieve superior generalization across unseen forgery techniques and data domains.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • HydraFake dataset with diversified deepfake techniques and in-the-wild forgeries for comprehensive evaluation
  • Veritas MLLM-based deepfake detector with pattern-aware reasoning
  • Two-stage training pipeline for internalizing deepfake reasoning capacities
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Multimodal LLMs
  • Chain-of-thought reasoning
  • Pattern-aware reasoning
  • Deepfake detection
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • HydraFake
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • Deepfake Detection
  • MLLMs

Related orals

Something off? Let us know →