ICLR 2026 Orals

Every Language Model Has a Forgery-Resistant Signature

Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

Safety, Privacy & Alignment Thu, Apr 23 · 11:42 AM–11:52 AM · 201 C Avg rating: 6.00 (4–8)
Author-provided TL;DR

We show that all language models impose elliptical constraints on their outputs, which can be used as a hard-to-fake signature to identify a model from its outputs.

Abstract

The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint—namely that language model outputs lie on the surface of a high-dimensional ellipse—functions as a signature for the model, which be used to identify which model an output came from. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model watermarks. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce logprobs on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model input or full weights. Finally, the signature is exceptionally redundant, as it is independently detectable in every single logprob output from the model. We evaluate a novel technique for extracting the ellipse on small models, and discuss the practical hurdles that make it infeasible for production-size models, making the signature hard to forge. Finally, we use ellipse signatures to propose a protocol for language model output verification, which is analogous to cryptographic symmetric-key message authentication systems.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Ellipse signatures function as forgery-resistant model output identifiers based on high-dimensional geometric constraints.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Identifies that language model outputs lie on high-dimensional ellipse surface serving as model signature
  • Demonstrates signature is hard to forge without direct access to model parameters
  • Proposes protocol for language model output verification analogous to cryptographic authentication
  • Signature is naturally occurring and self-contained without needing model input or full weights
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Geometric constraint analysis
  • Ellipse extraction
  • Model fingerprinting
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Hardness of forgery is only polynomial, far from cryptographic security guarantee
    from the paper
  • Proposed protocol requires API to provide logprobs, which is limited to few major providers
    from the paper
  • Signature is not difficult to remove since modifying outputs or parameters breaks ellipse constraints
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Identify other constraints on model outputs that give stronger security guarantees
    from the paper
  • Explore signatures that are difficult to remove as model fingerprints
    from the paper

Author keywords

  • fingerprint
  • watermark
  • language model
  • signature
  • accountability
  • cryptography
  • forgery
  • security

Related orals

Something off? Let us know →