Transformers are Inherently Succinct
Pascal Bergsträßer, Ryan Cotterell, Anthony Widjaja Lin
Abstract
We propose succinctness as a measure of expressive power of a transformer in describing a concept. To this end, we prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages like finite automata and Linear Temporal Logic (LTL) formulas. As a by-product of this expressivity, verifying even simple properties of transformers is shown to be provably intractable (i.e. EXPSPACE-complete).
Proves transformers with unique-hard attention are exponentially more succinct than finite automata and LTL formulas but verification is EXPSPACE-complete.
- Demonstrates transformers represent formal languages substantially more succinctly than automata
- Proves verification of simple transformer properties is EXPSPACE-complete
- Establishes succinctness as measure of expressive power for transformers
- Formal language theory
- Transformer verification
- Computational complexity
Authors did not state explicit limitations.
Develop automatic tools for analyzing, verifying, and explaining transformers
from the paperExploit techniques from automated verification to verify transformers in practice
from the paperStudy subclasses of transformers that cannot encode large counters for lower verification complexity
from the paperInvestigate learnability of succinct transformers
from the paperStudy succinctness of fixed finite precision softmax and average-hard attention transformers
from the paper
Author keywords
- Transformers
- complexity
- expressivity
- automata
- LTL
Related orals
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Benchmarks practical privacy risks in differential privacy-adapted LLMs, revealing distribution shifts and model choice impact effectiveness.
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Demonstrates LLMs can be finetuned to generate harmful steganographically-hidden outputs while appearing benign to safety systems.
Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents
Proposes T3 algorithm to detect belief deviation in LLM agents and truncate trajectories for improved reinforcement learning in active reasoning tasks.
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
RefineStat enforces semantic constraints and applies diagnostic-aware refinement for synthesizing valid probabilistic programs from smaller language models.