Softmax Transformers are Turing-Complete

Hongjian Jiang, Michael Hahn, Georg Zetzsche, Anthony Widjaja Lin

LLMs & Reasoning Fri, Apr 24 · 11:30 AM–11:40 AM · 202 A/B Avg rating: 5.50 (2–10)

Abstract

Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete.

More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for the letter-bounded languages). While we show that this is actually not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theoretical results by training transformers for various languages that require complex (non-linear) arithmetic reasoning.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proves length-generalizable softmax transformers with chain-of-thought and relative positional encoding are Turing-complete.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Proves length-generalizable softmax CoT transformers are Turing-complete
Shows softmax CoT with causal masking over unary alphabet is not Turing-complete
Demonstrates extension with relative positional encoding is Turing-complete for arbitrary languages
Validates theoretical results through training on languages requiring complex arithmetic reasoning

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Turing completeness
Transformers
Relative positional encoding

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Refine Turing-completeness with complexity class distinctions
from the paper
Study relationship between CoT steps and complexity classes
from the paper

Author keywords

soft attention
FLaNN
recursively enumerable
Turing-complete
formal languages

Something off? Let us know →

Softmax Transformers are Turing-Complete

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis