Diffusion Language Model Knows the Answer Before It Decodes

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Yi Liang, Soroush Vosoughi, Shiwei Liu

LLMs & Reasoning Sat, Apr 25 · 10:30 AM–10:40 AM · Amphitheater Avg rating: 6.50 (4–8)

Abstract

Diffusion language models (DLMs) have recently emerged as an alternative to autoregressive approaches, offering parallel sequence generation and flexible token orders. However, their inference remains slower than that of autoregressive models, primarily due to the cost of bidirectional attention and the large number of refinement steps required for high-quality outputs. In this work, we highlight and leverage an overlooked property of DLMs—**early answer convergence**: in many cases, the correct answer can be internally identified by half steps before the final decoding step, under both semi-autoregressive and random remasking schedules. For example, on GSM8K and MMLU, up to 97\% and 99\% of instances, respectively, can be decoded correctly using only half of the refinement steps. Building on this observation, we introduce **Prophet**, a training-free fast decoding paradigm that enables **early commit decoding**. Specifically, Prophet dynamically decides whether to continue refinement or to go "all-in" (i.e. decode all remaining tokens in one step), using the confidence gap between the top-2 prediction candidates as the criterion. It integrates seamlessly into existing DLM implementations, incurs negligible overhead, and requires no additional training. Empirical evaluations on LLaDA-8B and Dream-7B across multiple tasks show that Prophet reduces the number of decoding steps by up to 3.4$\times$ while preserving high generation quality, and yields additional speedups when combined with existing acceleration methods. These results recast DLM decoding as a problem of *when to stop sampling*, and demonstrate that early answer convergence provides a simple yet powerful mechanism for accelerating DLMs on reasoning, code, and planning tasks with identifiable answer regions. Our code is available at \url{https://github.com/pixeli99/Prophet}.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Prophet identifies early answer convergence in diffusion language models to accelerate decoding by 3.4x on reasoning tasks.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Identifies early answer convergence property of diffusion language models for half-step decoding
Introduces Prophet, training-free fast decoding paradigm using confidence gap criterion for early commit decisions
Reduces decoding steps by 3.4x while preserving generation quality without additional training
Demonstrates compatibility with distillation and cache-based acceleration methods

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Diffusion language models
Early stopping criteria
Confidence scoring

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

GSM8K
MMLU
HumanEval
Sudoku

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Designed for tasks with identifiable answer regions; less suitable for open-ended generation
from the paper
More conservative speedups on complex tasks like code generation compared to short-answer tasks
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Develop learnable judge-based termination criteria for improved robustness in tasks where confidence doesn't correlate with correctness
from the paper
Explore system-level optimizations with KV Cache frameworks for immediate inference termination
from the paper

Author keywords

diffusion language model
discrete

Something off? Let us know →

Diffusion Language Model Knows the Answer Before It Decodes

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis