Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

Yuxuan Zhou, Fei Huang, Heng Li, Fengyi Wu, Tianyu Wang, jianwei zhang, Junyang Lin, Zhi-Qi Cheng

Efficiency, Systems & Kernels Sat, Apr 25 · 11:18 AM–11:28 AM · Amphitheater Avg rating: 5.00 (0–8)

Abstract

Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose \emph{Hierarchical Speculative Decoding (HSD)}, a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12\% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Hierarchical Speculative Decoding uses lossless verification to maximize accepted tokens while preserving target distribution fidelity.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Proposes HSD, provably lossless verification method balancing excess and deficient probability mass across branches
Overcomes joint intractability by hierarchical approach avoiding surrogate approximations
Demonstrates 12% performance gain when integrated with EAGLE-3 without compromising distribution fidelity

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Speculative decoding
Hierarchical verification
Probability distribution balancing

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

Speculative Decoding
Joint Intractability
Lossless Verification

Something off? Let us know →

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

Abstract

Author keywords

Related orals

TileLang: Bridge Programmability and Performance in Modern Neural Kernels

Probabilistic Kernel Function for Fast Angle Testing

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Efficient Resource-Constrained Training of Transformers via Subspace Optimization

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention