To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

Eran Malach, Omid Saremi, Sinead Williamson, Arwen Bradley, Aryo Lotfi, Emmanuel Abbe, Joshua M. Susskind, Etai Littwin

LLMs & Reasoning Sat, Apr 25 · 11:18 AM–11:28 AM · 202 A/B Avg rating: 7.00 (4–8)

Abstract

State Space Models (SSMs) have become the leading alternative to Transformers for sequence modeling tasks. Their primary advantage is efficiency in long-context and long-form generation, enabled by fixed-size memory and linear scaling of computational complexity. We begin this work by showing a simple theoretical result stating that SSMs cannot accurately solve any "truly long-form" generation problem (in a sense we formally define), undermining their main competitive advantage. However, we show that this limitation can be mitigated by allowing SSMs interactive access to external tools. In fact, we show that given the right choice of tool access and problem-dependent training data, SSMs can learn to solve any tractable problem and generalize to arbitrary problem length/complexity (i.e., achieve length generalization). Following our theoretical finding, we demonstrate that tool-augmented SSMs achieve remarkable length generalization on a variety of arithmetic, reasoning, and coding tasks. These findings highlight SSMs as a potential efficient alternative to Transformers in interactive tool-based and agentic settings.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Shows tool-use enables state space models to achieve length generalization previously limited by fixed-size memory.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Theoretical result stating SSMs cannot solve truly long-form generation problems without tools
Demonstrates with right tool access and training, SSMs learn to solve tractable problems with arbitrary length generalization
Remarkable length generalization on arithmetic, reasoning, and coding tasks outperforming Transformers in efficiency

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

State space models
Tool use
Length generalization
Transformers
Language modeling

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Encourage development of tool-based SSMs operating in agentic settings such as coding, search or reasoning
from the paper

Author keywords

State Space Models
Mamba
Length Generalization
LLM
Transformers

Something off? Let us know →

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis