ICLR 2026 Orals

LLM Fingerprinting via Semantically Conditioned Watermarks

Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev

Safety, Privacy & Alignment Thu, Apr 23 · 11:18 AM–11:28 AM · 201 C Avg rating: 6.50 (6–8)
Author-provided TL;DR

We introduce a robust LLM fingerprinting method based on semantically conditioned watermarks

Abstract

Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce *LLM fingerprinting via semantically conditioned watermarks*, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint is both stealthy and robust to all common deployment scenarios.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces semantically conditioned watermarks for robust and stealthy LLM fingerprinting robust to deployment scenarios.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Novel fingerprinting method using domain-specific watermarks instead of fixed query-response pairs
  • Watermarking signal diffused throughout responses rather than in atypical keys
  • Robust to common deployment steps like finetuning and quantization
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Statistical watermarking
  • Semantic domain conditioning
  • Language model fine-tuning
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • AlpacaGPT4
  • OpenWebText
  • OpenMathInstruct
  • C4
  • GSM8K
  • Wikipedia
  • WildChat
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Method requires selecting a semantic domain where model distribution is distorted, which may degrade performance for some users
    from the paper
  • Fingerprint stealth relies partly on adversaries not knowing the semantic domain beforehand; if domain is known, adversaries could prevent detection by blocking related queries
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • LLM
  • Watermarks
  • Fingerprinting

Related orals

Something off? Let us know →