Mamba-3: Improved Sequence Modeling using State Space Principles

Aakash Lahoti, Kevin Li, Berlin Chen, Caitlin Wang, Aviv Bick, J Zico Kolter, Tri Dao, Albert Gu

LLMs & Reasoning Fri, Apr 24 · 4:03 PM–4:13 PM · Amphitheater Avg rating: 7.00 (6–8)

Author-provided TL;DR

Mamba-3, an inference-first SSM that pushes on core SSM principles: improved discretization for better quality, complex dynamics for new capabilities, and MIMO updates for efficient inference.

Abstract

Scaling inference-time compute has emerged as an important driver of LLM performance, making inference efficiency a central focus of model design alongside model quality. While current Transformer models deliver strong quality, their quadratic compute and linear memory requirements make inference expensive. This has spurred the development of sub-quadratic models with reduced compute and constant memory requirements. However, many recent linear models trade off model quality and capability for algorithmic efficiency, failing on tasks such as state tracking. Moreover, their theoretically linear inference remains hardware-inefficient in practice. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state space model (SSM) viewpoint of linear models. We combine: (1) a more expressive recurrence derived from SSM discretization, (2) a complex-valued state update rule enabling richer state tracking, and (3) a multi-input, multi-output (MIMO) formulation that improves model performance without increasing decode latency. Together with architectural refinements, Mamba-3 achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. At the 1.5B scale, Mamba-3 improves average downstream accuracy by 0.6 percentage points compared to the next best model (Gated DeltaNet), with the MIMO variant further improving accuracy by an additional 1.2 points, for a total gain of 1.8 points. Across state-size experiments, Mamba-3 achieves comparable perplexity to Mamba-2 despite using half the state size. These results demonstrate that Mamba-3 advances the performance–efficiency frontier.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Mamba-3 achieves 1.8 percentage point accuracy gain over Mamba-2 via expressive recurrence, complex-valued state updates, and MIMO formulation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

More expressive recurrence derived from SSM discretization using exponential-trapezoidal method
Complex-valued state update rule enabling richer state tracking for retrieval and tracking tasks
MIMO formulation improving model performance without increasing decode latency

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

State space models
SSM discretization
Linear attention
Sequence modeling

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Explore hybrid Mamba-3 architectures integrating retrieval mechanisms
from the paper
Broaden application of design principles to other linear-time sequence models
from the paper

Author keywords

State Space Models
Mamba
LLMs
Subquadratic Models

Something off? Let us know →

Mamba-3: Improved Sequence Modeling using State Space Principles

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis