Compositional Diffusion with Guided search for Long-Horizon Planning
Utkarsh Aashu Mishra, David He, Yongxin Chen, Danfei Xu
We integrate search into compositional diffusion to scale short-horizon models into long-horizon plans, supporting motion planning, panoramic image synthesis, and long-video generation.
Abstract
Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this \emph{mode averaging} problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/
Introduces CDGS integrating compositional diffusion with guided search for coherent long-horizon plan generation.
- Addresses mode averaging problem by embedding search within diffusion denoising process
- Population-based sampling exploring diverse local mode combinations with likelihood-based filtering
- Iterative resampling between overlapping segments enforcing global consistency
- Compositional diffusion
- Guided search
- Diffusion models
- Planning
- Likelihood-based filtering
Assumes ability to specify goal state, simplifying planning but extendable to goal-generation or classifier-guided methods
from the paperGenerates plans for fixed horizon; framework can handle arbitrary horizons but requires same start and goal
from the paperLong-horizon dependencies communicated through score averaging and resampling; more sophisticated message-passing or attention mechanisms could improve efficiency and coherence
from the paper
Authors did not state explicit future directions.
Author keywords
- Diffusion Models
- Compositional Diffusion
- Goal-directed Planning
Related orals
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
RealUID provides universal distillation for matching models without GANs, incorporating real data into one-step generator training.
GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
GLASS Flows samples Markov transitions via inner flow matching models to improve inference-time reward alignment in flow and diffusion models.
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Neon inverts model degradation from self-training by extrapolating away from it, improving generative models with minimal compute.
Generative Human Geometry Distribution
Introduces distribution-over-distribution model combining geometry distributions with two-stage flow matching for human 3D generation.
Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
Cross-domain lossy compression unifies rate and classification constraints via optimal transport framework.