Softmax Transformers are Turing-Complete
Hongjian Jiang, Michael Hahn, Georg Zetzsche, Anthony Widjaja Lin
Abstract
Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete.
More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for the letter-bounded languages). While we show that this is actually not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theoretical results by training transformers for various languages that require complex (non-linear) arithmetic reasoning.
Proves length-generalizable softmax transformers with chain-of-thought and relative positional encoding are Turing-complete.
- Proves length-generalizable softmax CoT transformers are Turing-complete
- Shows softmax CoT with causal masking over unary alphabet is not Turing-complete
- Demonstrates extension with relative positional encoding is Turing-complete for arbitrary languages
- Validates theoretical results through training on languages requiring complex arithmetic reasoning
- Turing completeness
- Transformers
- Relative positional encoding
Authors did not state explicit limitations.
Refine Turing-completeness with complexity class distinctions
from the paperStudy relationship between CoT steps and complexity classes
from the paper
Author keywords
- soft attention
- FLaNN
- recursively enumerable
- Turing-complete
- formal languages
Related orals
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Benchmarks practical privacy risks in differential privacy-adapted LLMs, revealing distribution shifts and model choice impact effectiveness.
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Demonstrates LLMs can be finetuned to generate harmful steganographically-hidden outputs while appearing benign to safety systems.
Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents
Proposes T3 algorithm to detect belief deviation in LLM agents and truncate trajectories for improved reinforcement learning in active reasoning tasks.
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
RefineStat enforces semantic constraints and applies diagnostic-aware refinement for synthesizing valid probabilistic programs from smaller language models.