Exploratory Diffusion Model for Unsupervised Reinforcement Learning

Chengyang Ying, Huayu Chen, Xinning Zhou, Zhongkai Hao, Hang Su, Jun Zhu

Reinforcement Learning & Agents Fri, Apr 24 · 3:39 PM–3:49 PM · 201 A/B Avg rating: 6.00 (6–6)

Author-provided TL;DR

We propose Exploratory Diffusion Model (ExDM), boosting unsupervised exploration and few-shot fine-tuning by diffusion models.

Abstract

Unsupervised reinforcement learning (URL) pre-trains agents by exploring diverse states in reward-free environments, aiming to enable efficient adaptation to various downstream tasks. Without extrinsic rewards, prior methods rely on intrinsic objectives, but heterogeneous exploration data demand strong modeling capacity for both intrinsic reward design and policy learning. We introduce the **Ex**ploratory **D**iffusion **M**odel (**ExDM**), which leverages the expressive power of diffusion models to fit diverse replay-buffer distributions, thus providing accurate density estimates and a score-based intrinsic reward that drives exploration into under-visited regions. This mechanism substantially broadens state coverage and yields robust pre-trained policies. Beyond exploration, ExDM offers theoretical guarantees and practical algorithms for fine-tuning diffusion policies under limited interactions, overcoming instability and computational overhead from multi-step sampling. Extensive experiments on Maze2d and URLB show that ExDM achieves superior exploration and faster downstream adaptation, establishing new state-of-the-art results, particularly in environments with complex structure or cross-embodiment settings. The source code is provided at https://github.com/yingchengyang/ExDM.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proposes ExDM using diffusion models for exploration and policy learning in unsupervised reinforcement learning.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Leverages expressive power of diffusion models to fit diverse replay-buffer distributions for accurate density estimation
Score-based intrinsic reward driving exploration into under-visited regions
Theoretical guarantees and practical algorithms for fine-tuning diffusion policies under limited interactions

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Diffusion models
Unsupervised reinforcement learning
Density estimation
Intrinsic motivation
Policy learning

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

Maze2d
URLB

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

reinforcement learning
diffusion policy
unsupervised reinforcement learning
exploration

Something off? Let us know →

Exploratory Diffusion Model for Unsupervised Reinforcement Learning

Abstract

Author keywords

Related orals

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport