ICLR 2026 Orals

Hallucination Begins Where Saliency Drops

Xiaofeng Zhang, Yuanchao Zhu, Chaochen Gu, Xiaosong Yuan, Qiyan Zhao, Jiawei Cao, Feilong Tang, Sinan Fan, Yaomin Shen, Chen Shen, Hao Tang

LLMs & Reasoning Fri, Apr 24 · 3:27 PM–3:37 PM · 202 A/B Avg rating: 6.00 (4–8)

Abstract

Recent studies have investigated attention dynamics in large vision language models (LVLMs), yet existing methods remain limited in reliably distinguishing hallucinated from correct outputs — primarily because they rely solely on forward-pass attention, ignoring gradient-based signals that reveal how token influence propagates through the model. To bridge this gap, we introduce \textbf{LVLMs-Saliency}, an \textit{gradient-aware diagnostic tool} that quantifies the grounding strength of each output token by fusing attention weights with their gradients. Through analysis, we identify a decisive pattern: \textit{Hallucinations occur when prior output tokens shows low saliency to the next token prediction}, indicating a failure of contextual memory. Building on this insight, we propose a dual-mechanism inference-time framework: (1) Saliency-Guided Rejection Sampling (SGRS), which dynamically filters candidate tokens during decoding by rejecting those with saliency below a context-adaptive threshold, thereby preventing coherence-breaking tokens from entering the sequence; and (2) Local Coherence Reinforcement (LocoRE), a lightweight plug-and-play module that strengthens attention from the current token to its most recent outputs, actively counteracting the “forgetting” behavior identified by LVLMs-Saliency. Experimental results demonstrate that our method significantly reduces hallucinations across multiple LVLMs, offering a robust and interpretable solution to improve model reliability.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Gradient-aware diagnostic tool using saliency to identify hallucination patterns, proposing SGRS and LocoRE interventions to reduce output errors.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Gradient-aware saliency tool LVLMs-Saliency quantifying grounding strength of output tokens
  • Finding that hallucinations occur when prior tokens show low saliency to next token prediction
  • Dual-mechanism framework: SGRS for token filtering and LocoRE for attention reinforcement
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • saliency analysis
  • gradient-based diagnosis
  • rejection sampling
  • attention reinforcement
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • LVLMs-Saliency; Saliency-Guided Rejection Sampling; Local Coherence Reinforcement; Hallucination

Related orals

Something off? Let us know →