Every Language Model Has a Forgery-Resistant Signature

Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

Safety, Privacy & Alignment Thu, Apr 23 · 11:42 AM–11:52 AM · 201 C Avg rating: 6.00 (4–8)

Author-provided TL;DR

We show that all language models impose elliptical constraints on their outputs, which can be used as a hard-to-fake signature to identify a model from its outputs.

Abstract

The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint—namely that language model outputs lie on the surface of a high-dimensional ellipse—functions as a signature for the model, which be used to identify which model an output came from. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model watermarks. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce logprobs on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model input or full weights. Finally, the signature is exceptionally redundant, as it is independently detectable in every single logprob output from the model. We evaluate a novel technique for extracting the ellipse on small models, and discuss the practical hurdles that make it infeasible for production-size models, making the signature hard to forge. Finally, we use ellipse signatures to propose a protocol for language model output verification, which is analogous to cryptographic symmetric-key message authentication systems.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Ellipse signatures function as forgery-resistant model output identifiers based on high-dimensional geometric constraints.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Identifies that language model outputs lie on high-dimensional ellipse surface serving as model signature
Demonstrates signature is hard to forge without direct access to model parameters
Proposes protocol for language model output verification analogous to cryptographic authentication
Signature is naturally occurring and self-contained without needing model input or full weights

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Geometric constraint analysis
Ellipse extraction
Model fingerprinting

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Hardness of forgery is only polynomial, far from cryptographic security guarantee
from the paper
Proposed protocol requires API to provide logprobs, which is limited to few major providers
from the paper
Signature is not difficult to remove since modifying outputs or parameters breaks ellipse constraints
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Identify other constraints on model outputs that give stronger security guarantees
from the paper
Explore signatures that are difficult to remove as model fingerprints
from the paper

Author keywords

fingerprint
watermark
language model
signature
accountability
cryptography
forgery
security

Something off? Let us know →

Every Language Model Has a Forgery-Resistant Signature

Abstract

Author keywords

Related orals

LLM Fingerprinting via Semantically Conditioned Watermarks

Steering the Herd: A Framework for LLM-based Control of Social Learning

Gaussian certified unlearning in high dimensions: A hypothesis testing approach

Differentially Private Domain Discovery

What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data