ICLR 2026 Orals

Mamba-3: Improved Sequence Modeling using State Space Principles

Aakash Lahoti, Kevin Li, Berlin Chen, Caitlin Wang, Aviv Bick, J Zico Kolter, Tri Dao, Albert Gu

LLMs & Reasoning Fri, Apr 24 · 4:03 PM–4:13 PM · Amphitheater Avg rating: 7.00 (6–8)
Author-provided TL;DR

Mamba-3, an inference-first SSM that pushes on core SSM principles: improved discretization for better quality, complex dynamics for new capabilities, and MIMO updates for efficient inference.

Abstract

Scaling inference-time compute has emerged as an important driver of LLM performance, making inference efficiency a central focus of model design alongside model quality. While current Transformer models deliver strong quality, their quadratic compute and linear memory requirements make inference expensive. This has spurred the development of sub-quadratic models with reduced compute and constant memory requirements. However, many recent linear models trade off model quality and capability for algorithmic efficiency, failing on tasks such as state tracking. Moreover, their theoretically linear inference remains hardware-inefficient in practice. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state space model (SSM) viewpoint of linear models. We combine: (1) a more expressive recurrence derived from SSM discretization, (2) a complex-valued state update rule enabling richer state tracking, and (3) a multi-input, multi-output (MIMO) formulation that improves model performance without increasing decode latency. Together with architectural refinements, Mamba-3 achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. At the 1.5B scale, Mamba-3 improves average downstream accuracy by 0.6 percentage points compared to the next best model (Gated DeltaNet), with the MIMO variant further improving accuracy by an additional 1.2 points, for a total gain of 1.8 points. Across state-size experiments, Mamba-3 achieves comparable perplexity to Mamba-2 despite using half the state size. These results demonstrate that Mamba-3 advances the performance–efficiency frontier.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Mamba-3 achieves 1.8 percentage point accuracy gain over Mamba-2 via expressive recurrence, complex-valued state updates, and MIMO formulation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • More expressive recurrence derived from SSM discretization using exponential-trapezoidal method
  • Complex-valued state update rule enabling richer state tracking for retrieval and tracking tasks
  • MIMO formulation improving model performance without increasing decode latency
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • State space models
  • SSM discretization
  • Linear attention
  • Sequence modeling
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Explore hybrid Mamba-3 architectures integrating retrieval mechanisms
    from the paper
  • Broaden application of design principles to other linear-time sequence models
    from the paper

Author keywords

  • State Space Models
  • Mamba
  • LLMs
  • Subquadratic Models

Related orals

Something off? Let us know →