Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Wuyang Li, Wentao Pan, Po-Chien Luan, Yang Gao, Alexandre Alahi

Diffusion & Flow Matching Sat, Apr 25 · 10:30 AM–10:40 AM · 201 A/B Avg rating: 6.50 (4–8)

Abstract

We propose **Stable Video Infinity (SVI)** that can generate non-looping, ultra-long videos with stable visual quality, while supporting per-clip prompt control and multi-modal conditioning. While existing long-video methods attempt to _**mitigate accumulated errors**_ via handcrafted anti-drifting (e.g., modified noise scheduler, frame anchoring), they remain limited to single-prompt extrapolation, producing homogeneous scenes with repetitive motions. We identify that the fundamental challenge extends beyond error accumulation to a critical discrepancy between the training assumption (seeing clean data) and the test-time autoregressive reality (conditioning on self-generated, error-prone outputs). To bridge this hypothesis gap, SVI incorporates **Error-Recycling Fine-Tuning**, a new type of efficient training that recycles the Diffusion Transformer (DiT)’s self-generated errors into supervisory prompts, thereby encouraging DiT to _**actively identify and correct its own errors**_. This is achieved by injecting, collecting, and banking errors through closed-loop recycling, autoregressively learning from error-injected feedback. Specifically, we (i) inject historical errors made by DiT to intervene on clean inputs, simulating error-accumulated trajectories in flow matching; (ii) efficiently approximate predictions with one-step bidirectional integration and calculate errors with residuals; (iii) dynamically bank errors into replay memory across discretized timesteps, which are resampled for new input. SVI is able to scale videos from seconds to infinite durations with no additional inference cost, while remaining compatible with diverse conditions (e.g., audio, skeleton, and text streams). We evaluate SVI on three benchmarks, including consistent, creative, and conditional settings, thoroughly verifying its versatility and state-of-the-art role.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Generates ultra-long videos by actively correcting self-generated errors through error-recycling fine-tuning.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Identifies training-test hypothesis gap in long video generation leading to accumulated errors
Proposes Error-Recycling Fine-Tuning enabling models to actively correct own errors
Injects historical errors to intervene on clean inputs, simulating error-accumulated trajectories
Scales videos from seconds to infinite durations without additional inference cost

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Error recycling
Flow matching
Diffusion transformers

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

Infinite-Length Video Generation
Error Accumulation

Something off? Let us know →

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Abstract

Author keywords

Related orals

Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)

GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models

Neon: Negative Extrapolation From Self-Training Improves Image Generation

Generative Human Geometry Distribution

Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport