The Spacetime of Diffusion Models: An Information Geometry Perspective
Rafal Karczewski, Markus Heinonen, Alison Pouplin, Søren Hauberg, Vikas K Garg
Abstract
We present a novel geometric perspective on the latent space of diffusion models. We first show that the standard pullback approach, utilizing the deterministic probability flow ODE decoder, is fundamentally flawed. It provably forces geodesics to decode as straight segments in data space, effectively ignoring any intrinsic data geometry beyond the ambient Euclidean space. Complementing this view, diffusion also admits a stochastic decoder via the reverse SDE, which enables an information geometric treatment with the Fisher-Rao metric. However, a choice of $\mathbf{x}_T$ as the latent representation collapses this metric due to memorylessness. We address this by introducing a latent spacetime $\mathbf{z}=(\mathbf{x}_t,t)$ that indexes the family of denoising distributions $p(\mathbf{x}_0 | \mathbf{x}_t)$ across all noise scales, yielding a nontrivial geometric structure. We prove these distributions form an exponential family and derive simulation-free estimators for curve lengths, enabling efficient geodesic computation. The resulting structure induces a principled Diffusion Edit Distance, where geodesics trace minimal sequences of noise and denoise edits between data. We also demonstrate benefits for transition path sampling in molecular systems, including constrained variants such as low-variance transitions and region avoidance. Code is available at: https://github.com/rafalkarczewski/spacetime-geometry.
Spacetime perspective views diffusion latent spaces as Fisher-Rao metric manifolds enabling efficient geodesic computation without simulation.
- Introduces latent spacetime representation (x_t, t) indexing family of denoising distributions across noise scales
- Proves denoising distributions form exponential family enabling tractable geodesic estimation
- Derives simulation-free estimators for curve lengths enabling efficient geodesic computation in high dimensions
- Information geometry
- Fisher-Rao metric
- Optimal transport
- Exponential families
- Molecular systems
- Image datasets
Optimizing between nearly clean samples numerically unstable due to denoising distributions collapsing to Dirac deltas with effectively infinite distances
from the paperProposed Diffusion Edit Distance considerably slower than established similarity metrics like LPIPS and SSIM
from the paper
Explore distillation strategy training separate model to predict Diffusion Edit Distance
from the paper
Author keywords
- diffusion models
- information geometry
Related orals
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
RealUID provides universal distillation for matching models without GANs, incorporating real data into one-step generator training.
GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
GLASS Flows samples Markov transitions via inner flow matching models to improve inference-time reward alignment in flow and diffusion models.
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Neon inverts model degradation from self-training by extrapolating away from it, improving generative models with minimal compute.
Generative Human Geometry Distribution
Introduces distribution-over-distribution model combining geometry distributions with two-stage flow matching for human 3D generation.
Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
Cross-domain lossy compression unifies rate and classification constraints via optimal transport framework.