Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
Yidan Xu, Yixin Wang, XuanLong Nguyen
A framework that composes any probabilistic graphical model with flow matching, jointly learning structured latent representations and high-fidelity generative models through a single objective.
Abstract
Flow matching is a powerful approach for high-fidelity density estimation, but it often fails to capture the latent structure of complex data. Probabilistic models like variational autoencoders (VAEs), on the other hand, learn structured representations but underperform in sample quality. We propose Structured Flow Autoencoders (SFA), a family of probabilistic models that augments graphical models with conditional continuous normalizing flow (CNF) likelihoods, enabling flow-matching-based structured representation learning. At the core of SFA is a novel flow matching objective that explicitly accounts for latent variables, allowing joint learning of the CNF likelihood and posterior. SFA applies broadly to graphical models with continuous and mixture latents, as well as latent dynamical systems. Empirical studies across image, video, and RNA-seq data show that SFA consistently outperforms VAEs and their structured extensions in generation quality, representation utility, and scalability to large datasets. Compared to generative models like latent flow matching (LatentFM), SFA also produces more diverse samples, suggesting better coverage of the data distribution.
Structured Flow Autoencoders integrate flow matching with graphical models for structured representation learning.
- Structured conditional flow matching objective accounting for latent variables in flow matching
- Enables joint learning of CNF likelihood and posterior for probabilistic graphical models
- Framework applicable to continuous, mixture, and dynamical latent structures
- Outperforms VAEs and latent flow matching on generation quality and representation utility
- Flow matching
- Graphical models
- Continuous normalizing flows
- Variational autoencoders
CNF posterior variant remains computationally expensive due to ODE solver at each gradient step
from the paperPrincipled selection strategies for posterior family remain open question
from the paperArchitectural challenge of designing UNet-based decoders conditioning on learned stochastic latent
from the paperStandard conditioning mechanisms may be insufficient when latent carries distributional uncertainty
from the paper
Authors did not state explicit future directions.
Author keywords
- Flow Matching
- Probabilistic Model
- Representation Learning
- Probabilistic Graphical Model
- Autoencoder
Related orals
Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
Develops causal structure learning framework for Hawkes processes identifying latent confounder subprocesses.
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
Generates diverse synthetic time series for pretraining foundation models with clear scaling laws.
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
Solves optimal multi-draft speculative sampling via convex optimization achieving 90% acceptance rates.
Conformal Robustness Control: A New Strategy for Robust Decision
CRC optimizes prediction set construction under explicit robustness constraints instead of coverage for more efficient robust decisions.