Gaussian certified unlearning in high dimensions: A hypothesis testing approach

Aaradhya Pandey, Arnab Auddy, Haolin Zou, Arian Maleki, Sanjeev Kulkarni

Safety, Privacy & Alignment Thu, Apr 23 · 11:42 AM–11:52 AM · 204 A/B Avg rating: 6.00 (4–8)

Author-provided TL;DR

We introduce the canonical dimension free notion of certifiability suitable to high dimensions and show its utility via a Newton based unlearning algorithm

Abstract

Machine unlearning seeks to efficiently remove the influence of selected data while preserving generalization. Significant progress has been made in low dimensions, where the dimension of the parameter $p$ is much smaller than the sample size $n$, but high dimensions, including proportional regimes $p \sim n$, pose serious theoretical challenges as standard optimization assumptions of $\Omega(1)$ strong convexity and $O(1)$ smoothness of the per-example loss $f$ rarely hold simultaneously in proportional regimes $p\sim n$. In this work, we introduce $\varepsilon$-Gaussian certifiability, a canonical and robust notion well-suited to high-dimensional regimes, that optimally captures a broad class of noise adding mechanisms. Then we theoretically analyze the performance of a widely used unlearning algorithm based on one step of the Newton method in the high-dimensional setting described above. Our analysis shows that a single Newton step, followed by a well-calibrated Gaussian noise, is sufficient to achieve both privacy and accuracy in this setting. This result stands in sharp contrast to the only prior work that analyzes machine unlearning in high dimensions \citet{zou2025certified}, which relaxes some of the standard optimization assumptions for high-dimensional applicability, but operates under the notion of $\varepsilon$-certifiability. That work concludes %that a single Newton step is insufficient even for removing a single data point, and that at least two steps are required to ensure both privacy and accuracy. Our result leads us to conclude that the discrepancy in the number of steps arises because of the sub optimality of the notion of $\varepsilon$-certifiability and its incompatibility with noise adding mechanisms, which $\varepsilon$-Gaussian certifiability is able to overcome optimally.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Analyzes machine unlearning in high dimensions showing single noisy Newton step with Gaussian noise suffices for privacy-accuracy.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces epsilon-Gaussian certifiability concept optimal for high-dimensional unlearning with noise-adding mechanisms
Proves one-step Gaussian-Newton method achieves both privacy and accuracy in proportional regime p~n
Shows improved gap over prior work requiring two steps through better notion of certifiability

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Newton method
Gaussian noise addition
Hypothesis testing framework
Differential privacy

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Linear Hawkes process assumption does not guarantee non-negativity of intensity function
from the paper
Computational complexity scales cubically with dimensionality, suitable for moderate dimensions up to few hundred
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Analyze beyond generalized linear models and high-dimensional regimes
from the paper
Extend to non-convex loss functions for deep neural networks
from the paper
Investigate approximate second-order methods like conjugate gradient to reduce Hessian computation cost
from the paper
Strengthen scaling for multiple simultaneous deletions from m^4=o(n) to m^3=o(n)
from the paper
Study distributional unlearning including class or concept unlearning
from the paper
Handle continuous unlearning requests in online setting
from the paper

Author keywords

Machine unlearning in high dimensions
Proportional asymptotics
High dimensional statistical theory
Privacy–accuracy tradeoff
Hypothesis testing
Gaussian noise calibration
Newton method

Something off? Let us know →

Gaussian certified unlearning in high dimensions: A hypothesis testing approach

Abstract

Author keywords

Related orals

LLM Fingerprinting via Semantically Conditioned Watermarks

Steering the Herd: A Framework for LLM-based Control of Social Learning

Every Language Model Has a Forgery-Resistant Signature

Differentially Private Domain Discovery

What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data