ICLR 2026 Orals

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

Yuxuan Zhou, Fei Huang, Heng Li, Fengyi Wu, Tianyu Wang, jianwei zhang, Junyang Lin, Zhi-Qi Cheng

Efficiency, Systems & Kernels Sat, Apr 25 · 11:18 AM–11:28 AM · Amphitheater Avg rating: 5.00 (0–8)

Abstract

Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose \emph{Hierarchical Speculative Decoding (HSD)}, a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12\% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Hierarchical Speculative Decoding uses lossless verification to maximize accepted tokens while preserving target distribution fidelity.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Proposes HSD, provably lossless verification method balancing excess and deficient probability mass across branches
  • Overcomes joint intractability by hierarchical approach avoiding surrogate approximations
  • Demonstrates 12% performance gain when integrated with EAGLE-3 without compromising distribution fidelity
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Speculative decoding
  • Hierarchical verification
  • Probability distribution balancing
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • Speculative Decoding
  • Joint Intractability
  • Lossless Verification

Related orals

Something off? Let us know →