ICLR 2026 Orals

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Tao Ren, Zishi Zhang, Jinyang Jiang, Zehao Li, Shentao Qin, Yi Zheng, Guanghao Li, Qianyou Sun, Yan Li, Jiafeng Liang, Xinping Li, Yijie Peng

LLMs & Reasoning Thu, Apr 23 · 10:30 AM–10:40 AM · 201 A/B Avg rating: 6.67 (6–8)

Abstract

The probabilistic diffusion model (DM), generating content by inferencing through a recursive chain structure, has emerged as a powerful framework for visual generation. After pre-training on enormous data, the model needs to be properly aligned to meet requirements for downstream applications. How to efficiently align the foundation DM is a crucial task. Contemporary methods are either based on Reinforcement Learning (RL) or truncated Backpropagation (BP). However, RL and truncated BP suffer from low sample efficiency and biased gradient estimation, respectively, resulting in limited improvement or, even worse, complete training failure. To overcome the challenges, we propose the Recursive Likelihood Ratio (RLR) optimizer, a Half-Order (HO) fine-tuning paradigm for DM. The HO gradient estimator enables the computation graph rearrangement within the recursive diffusive chain, making the RLR's gradient estimator **an unbiased one with lower variance** than other methods. We theoretically investigate the bias, variance, and convergence of our method. Extensive experiments are conducted on image and video generation to validate the superiority of the RLR. Furthermore, we propose a novel prompt technique that is natural for the RLR to achieve a synergistic effect. The implementation is available at https://github.com/RTkenny/RLR-Optimizer.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Half-order fine-tuning paradigm with unbiased gradient estimator for diffusion models
  • Theoretical analysis of bias, variance, and convergence properties
  • Diffusive Chain-of-Thought prompt technique for synergistic improvements
  • Demonstrates superior performance on Text2Image and Text2Video tasks
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Recursive Likelihood Ratio optimizer
  • Flow matching
  • Gradient estimation
  • Reinforcement learning
  • Backpropagation
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Text2Image datasets
  • Text2Video datasets
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • diffusion model
  • post-training
  • stochastic gradient estimation

Related orals

Something off? Let us know →