Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Bartłomiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, Adam Dziedzic

LLMs & Reasoning Thu, Apr 23 · 10:30 AM–10:40 AM · 201 C Avg rating: 5.50 (4–8)

Author-provided TL;DR

DP adaptations of LLMs can leak data in practice, with risk rising as adaptation data becomes closer to the pretraining distribution.

Abstract

Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as robust membership inference and canary data extraction. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk at the same theoretical guarantee, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify and evaluate risks across the full pretrain-adapt pipeline of LLMs.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Benchmarks practical privacy risks in differential privacy-adapted LLMs, revealing distribution shifts and model choice impact effectiveness.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

First systematic empirical analysis of privacy risks under DP adaptations via membership inference and data extraction attacks
Demonstrates that distribution closeness between pretraining and adaptation data determines practical privacy vulnerability
Shows LoRA enables higher empirical privacy protection for OOD data compared to other fine-tuning methods
Proposes holistic privacy assessment framework spanning the full pretrain-adapt pipeline

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Differential privacy
Membership inference attacks
Data extraction attacks
LoRA
Prefix tuning

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Work focuses solely on auditing private adaptations and leakage from pretraining data after adaptations
from the paper
For holistic privacy auditing, methods to audit all process stages jointly are needed
from the paper
Focuses only on subset of models, leaving out state-of-the-art closed models like GPT-4 due to API constraints
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

privacy
llm
adaptations
auditing
differential privacy

Something off? Let us know →

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Abstract

Author keywords

Related orals

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis

Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference