Hallucination Begins Where Saliency Drops

Xiaofeng Zhang, Yuanchao Zhu, Chaochen Gu, Xiaosong Yuan, Qiyan Zhao, Jiawei Cao, Feilong Tang, Sinan Fan, Yaomin Shen, Chen Shen, Hao Tang

LLMs & Reasoning Fri, Apr 24 · 3:27 PM–3:37 PM · 202 A/B Avg rating: 6.00 (4–8)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Abstract

Recent studies have investigated attention dynamics in large vision language models (LVLMs), yet existing methods remain limited in reliably distinguishing hallucinated from correct outputs — primarily because they rely solely on forward-pass attention, ignoring gradient-based signals that reveal how token influence propagates through the model. To bridge this gap, we introduce \textbf{LVLMs-Saliency}, an \textit{gradient-aware diagnostic tool} that quantifies the grounding strength of each output token by fusing attention weights with their gradients. Through analysis, we identify a decisive pattern: \textit{Hallucinations occur when prior output tokens shows low saliency to the next token prediction}, indicating a failure of contextual memory. Building on this insight, we propose a dual-mechanism inference-time framework: (1) Saliency-Guided Rejection Sampling (SGRS), which dynamically filters candidate tokens during decoding by rejecting those with saliency below a context-adaptive threshold, thereby preventing coherence-breaking tokens from entering the sequence; and (2) Local Coherence Reinforcement (LocoRE), a lightweight plug-and-play module that strengthens attention from the current token to its most recent outputs, actively counteracting the “forgetting” behavior identified by LVLMs-Saliency. Experimental results demonstrate that our method significantly reduces hallucinations across multiple LVLMs, offering a robust and interpretable solution to improve model reliability.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Gradient-aware diagnostic tool using saliency to identify hallucination patterns, proposing SGRS and LocoRE interventions to reduce output errors.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Gradient-aware saliency tool LVLMs-Saliency quantifying grounding strength of output tokens
Finding that hallucinations occur when prior tokens show low saliency to next token prediction
Dual-mechanism framework: SGRS for token filtering and LocoRE for attention reinforcement

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

saliency analysis
gradient-based diagnosis
rejection sampling
attention reinforcement

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

LVLMs-Saliency; Saliency-Guided Rejection Sampling; Local Coherence Reinforcement; Hallucination

Something off? Let us know →

Hallucination Begins Where Saliency Drops

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis