ICLR 2026 Orals

Compositional Diffusion with Guided search for Long-Horizon Planning

Utkarsh Aashu Mishra, David He, Yongxin Chen, Danfei Xu

Diffusion & Flow Matching Fri, Apr 24 · 4:27 PM–4:37 PM · 201 A/B Avg rating: 6.50 (6–8)
Author-provided TL;DR

We integrate search into compositional diffusion to scale short-horizon models into long-horizon plans, supporting motion planning, panoramic image synthesis, and long-video generation.

Abstract

Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this \emph{mode averaging} problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces CDGS integrating compositional diffusion with guided search for coherent long-horizon plan generation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Addresses mode averaging problem by embedding search within diffusion denoising process
  • Population-based sampling exploring diverse local mode combinations with likelihood-based filtering
  • Iterative resampling between overlapping segments enforcing global consistency
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Compositional diffusion
  • Guided search
  • Diffusion models
  • Planning
  • Likelihood-based filtering
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Assumes ability to specify goal state, simplifying planning but extendable to goal-generation or classifier-guided methods
    from the paper
  • Generates plans for fixed horizon; framework can handle arbitrary horizons but requires same start and goal
    from the paper
  • Long-horizon dependencies communicated through score averaging and resampling; more sophisticated message-passing or attention mechanisms could improve efficiency and coherence
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • Diffusion Models
  • Compositional Diffusion
  • Goal-directed Planning

Related orals

Generative Human Geometry Distribution

Introduces distribution-over-distribution model combining geometry distributions with two-stage flow matching for human 3D generation.

Avg rating: 5.50 (2–8) · Xiangjun Tang et al.
Something off? Let us know →