Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

Shikang Zheng, Guantao Chen, Qinming Zhou, Yuqi Lin, Lixuan He, Chang Zou, Peiliang Cai, Jiacheng Liu, Linfeng Zhang

LLMs & Reasoning Thu, Apr 23 · 10:54 AM–11:04 AM · 201 A/B Avg rating: 7.00 (4–10)

Abstract

Diffusion Transformers offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck due to the high cost of transformer forward passes at each timestep. To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses hidden representations. However, existing methods often apply a uniform caching strategy across all feature dimensions, ignoring their heterogeneous dynamic behaviors. Therefore, we adopt a new perspective by modeling hidden feature evolution as a mixture of ODEs across dimensions, and introduce \textbf{HyCa}, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies. HyCa achieves near-lossless acceleration across diverse tasks and models, including 5.55$\times$ speedup on FLUX, 5.56$\times$ speedup on HunyuanVideo, 6.24$\times$ speedup on Qwen-Image and Qwen-Image-Edit without retraining.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

HyCa uses hybrid ODE solvers with dimension-wise caching strategies to accelerate diffusion transformers by 5-6x without retraining.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Models hidden feature evolution as mixture of ODEs across dimensions in diffusion transformers
Introduces HyCa framework applying dimension-wise caching strategies instead of uniform strategies
Achieves 5.55x speedup on FLUX and 6.24x on Qwen-Image without retraining
Demonstrates compatibility with distilled models and LoRA fine-tuning

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

ODE solvers
Feature clustering
Caching strategies
Diffusion models

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

FLUX text-to-image
HunyuanVideo
Qwen-Image
Qwen-Image-Edit

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Extend mixture-of-ODE perspective to other generative models
from the paper
Explore learning-based caching strategies to further enhance efficiency
from the paper

Author keywords

Generative models
Efficient ML
Diffusion Transformer Acceleration
Feature Caching

Something off? Let us know →

Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis