Compositional Diffusion with Guided search for Long-Horizon Planning

Utkarsh Aashu Mishra, David He, Yongxin Chen, Danfei Xu

Diffusion & Flow Matching Fri, Apr 24 · 4:27 PM–4:37 PM · 201 A/B Avg rating: 6.50 (6–8)

Author-provided TL;DR

We integrate search into compositional diffusion to scale short-horizon models into long-horizon plans, supporting motion planning, panoramic image synthesis, and long-video generation.

Abstract

Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this \emph{mode averaging} problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces CDGS integrating compositional diffusion with guided search for coherent long-horizon plan generation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Addresses mode averaging problem by embedding search within diffusion denoising process
Population-based sampling exploring diverse local mode combinations with likelihood-based filtering
Iterative resampling between overlapping segments enforcing global consistency

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Compositional diffusion
Guided search
Diffusion models
Planning
Likelihood-based filtering

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Assumes ability to specify goal state, simplifying planning but extendable to goal-generation or classifier-guided methods
from the paper
Generates plans for fixed horizon; framework can handle arbitrary horizons but requires same start and goal
from the paper
Long-horizon dependencies communicated through score averaging and resampling; more sophisticated message-passing or attention mechanisms could improve efficiency and coherence
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

Diffusion Models
Compositional Diffusion
Goal-directed Planning

Something off? Let us know →

Compositional Diffusion with Guided search for Long-Horizon Planning

Abstract

Author keywords

Related orals

Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)

GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models

Neon: Negative Extrapolation From Self-Training Improves Image Generation

Generative Human Geometry Distribution

Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport