LLM Fingerprinting via Semantically Conditioned Watermarks

Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev

Safety, Privacy & Alignment Thu, Apr 23 · 11:18 AM–11:28 AM · 201 C Avg rating: 6.50 (6–8)

Author-provided TL;DR

We introduce a robust LLM fingerprinting method based on semantically conditioned watermarks

Abstract

Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce *LLM fingerprinting via semantically conditioned watermarks*, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint is both stealthy and robust to all common deployment scenarios.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces semantically conditioned watermarks for robust and stealthy LLM fingerprinting robust to deployment scenarios.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Novel fingerprinting method using domain-specific watermarks instead of fixed query-response pairs
Watermarking signal diffused throughout responses rather than in atypical keys
Robust to common deployment steps like finetuning and quantization

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Statistical watermarking
Semantic domain conditioning
Language model fine-tuning

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

AlpacaGPT4
OpenWebText
OpenMathInstruct
C4
GSM8K
Wikipedia
WildChat

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Method requires selecting a semantic domain where model distribution is distorted, which may degrade performance for some users
from the paper
Fingerprint stealth relies partly on adversaries not knowing the semantic domain beforehand; if domain is known, adversaries could prevent detection by blocking related queries
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

LLM
Watermarks
Fingerprinting

Something off? Let us know →

LLM Fingerprinting via Semantically Conditioned Watermarks

Abstract

Author keywords

Related orals

Steering the Herd: A Framework for LLM-based Control of Social Learning

Every Language Model Has a Forgery-Resistant Signature

Gaussian certified unlearning in high dimensions: A hypothesis testing approach

Differentially Private Domain Discovery

What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data