ICLR 2026 Orals

Transformers are Inherently Succinct

Pascal Bergsträßer, Ryan Cotterell, Anthony Widjaja Lin

LLMs & Reasoning Fri, Apr 24 · 4:27 PM–4:37 PM · Amphitheater Avg rating: 7.00 (4–8)

Abstract

We propose succinctness as a measure of expressive power of a transformer in describing a concept. To this end, we prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages like finite automata and Linear Temporal Logic (LTL) formulas. As a by-product of this expressivity, verifying even simple properties of transformers is shown to be provably intractable (i.e. EXPSPACE-complete).

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proves transformers with unique-hard attention are exponentially more succinct than finite automata and LTL formulas but verification is EXPSPACE-complete.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Demonstrates transformers represent formal languages substantially more succinctly than automata
  • Proves verification of simple transformer properties is EXPSPACE-complete
  • Establishes succinctness as measure of expressive power for transformers
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Formal language theory
  • Transformer verification
  • Computational complexity
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Develop automatic tools for analyzing, verifying, and explaining transformers
    from the paper
  • Exploit techniques from automated verification to verify transformers in practice
    from the paper
  • Study subclasses of transformers that cannot encode large counters for lower verification complexity
    from the paper
  • Investigate learnability of succinct transformers
    from the paper
  • Study succinctness of fixed finite precision softmax and average-hard attention transformers
    from the paper

Author keywords

  • Transformers
  • complexity
  • expressivity
  • automata
  • LTL

Related orals

Something off? Let us know →