ICLR 2026 Orals

Softmax Transformers are Turing-Complete

Hongjian Jiang, Michael Hahn, Georg Zetzsche, Anthony Widjaja Lin

LLMs & Reasoning Fri, Apr 24 · 11:30 AM–11:40 AM · 202 A/B Avg rating: 5.50 (2–10)

Abstract

Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete.

More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for the letter-bounded languages). While we show that this is actually not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theoretical results by training transformers for various languages that require complex (non-linear) arithmetic reasoning.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proves length-generalizable softmax transformers with chain-of-thought and relative positional encoding are Turing-complete.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Proves length-generalizable softmax CoT transformers are Turing-complete
  • Shows softmax CoT with causal masking over unary alphabet is not Turing-complete
  • Demonstrates extension with relative positional encoding is Turing-complete for arbitrary languages
  • Validates theoretical results through training on languages requiring complex arithmetic reasoning
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Turing completeness
  • Transformers
  • Relative positional encoding
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Refine Turing-completeness with complexity class distinctions
    from the paper
  • Study relationship between CoT steps and complexity classes
    from the paper

Author keywords

  • soft attention
  • FLaNN
  • recursively enumerable
  • Turing-complete
  • formal languages

Related orals

Something off? Let us know →