Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Tao Ren, Zishi Zhang, Jinyang Jiang, Zehao Li, Shentao Qin, Yi Zheng, Guanghao Li, Qianyou Sun, Yan Li, Jiafeng Liang, Xinping Li, Yijie Peng

LLMs & Reasoning Thu, Apr 23 · 10:30 AM–10:40 AM · 201 A/B Avg rating: 6.67 (6–8)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Abstract

The probabilistic diffusion model (DM), generating content by inferencing through a recursive chain structure, has emerged as a powerful framework for visual generation. After pre-training on enormous data, the model needs to be properly aligned to meet requirements for downstream applications. How to efficiently align the foundation DM is a crucial task. Contemporary methods are either based on Reinforcement Learning (RL) or truncated Backpropagation (BP). However, RL and truncated BP suffer from low sample efficiency and biased gradient estimation, respectively, resulting in limited improvement or, even worse, complete training failure. To overcome the challenges, we propose the Recursive Likelihood Ratio (RLR) optimizer, a Half-Order (HO) fine-tuning paradigm for DM. The HO gradient estimator enables the computation graph rearrangement within the recursive diffusive chain, making the RLR's gradient estimator **an unbiased one with lower variance** than other methods. We theoretically investigate the bias, variance, and convergence of our method. Extensive experiments are conducted on image and video generation to validate the superiority of the RLR. Furthermore, we propose a novel prompt technique that is natural for the RLR to achieve a synergistic effect. The implementation is available at https://github.com/RTkenny/RLR-Optimizer.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Half-order fine-tuning paradigm with unbiased gradient estimator for diffusion models
Theoretical analysis of bias, variance, and convergence properties
Diffusive Chain-of-Thought prompt technique for synergistic improvements
Demonstrates superior performance on Text2Image and Text2Video tasks

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Recursive Likelihood Ratio optimizer
Flow matching
Gradient estimation
Reinforcement learning
Backpropagation

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

Text2Image datasets
Text2Video datasets

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

diffusion model
post-training
stochastic gradient estimation

Something off? Let us know →

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis

Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference