ICLR 2026 Orals

Differentially Private Domain Discovery

Vinod Raman, Travis Dick, Matthew Joseph

Safety, Privacy & Alignment Thu, Apr 23 · 3:39 PM–3:49 PM · 201 C Avg rating: 6.50 (6–8)

Abstract

We study several problems in differentially private domain discovery, where each user holds a subset of items from a shared but unknown domain, and the goal is to output an informative subset of items. For set union, we show that the simple baseline Weighted Gaussian Mechanism (WGM) has a near-optimal $\ell_1$ missing mass guarantee on Zipfian data as well as a distribution-free $\ell_\infty$ missing mass guarantee. We then apply the WGM as a domain-discovery precursor for existing known-domain algorithms for private top-$k$ and $k$-hitting set and obtain new utility guarantees for their unknown domain variants. Finally, experiments demonstrate that all of our WGM-based methods are competitive with or outperform existing baselines for all three problems.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

WGM-based methods provide efficient domain discovery with near-optimal guarantees for missing mass on Zipfian data.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Proves WGM has near-optimal ell1 missing mass guarantee on Zipfian data
  • Distribution-free ell-infinity missing mass guarantee for WGM
  • Applies WGM as precursor for private top-k and k-hitting set algorithms with new utility guarantees
  • Competitive or outperforming baseline methods across all three domain discovery problems
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Weighted Gaussian Mechanism
  • Differential privacy
  • Domain discovery
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Steam Games
  • Amazon Magazine
  • Amazon Pantry
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Close gaps between upper and lower bounds for top-k and k-hitting set problems
    from the paper
  • Extend data-dependent subsampling strategies from Chen et al. 2025 to missing mass
    from the paper

Author keywords

  • Differential Privacy
  • Partition Selection
  • Top-k Selection

Related orals

Something off? Let us know →