Differentially Private Domain Discovery
WGM-based methods provide efficient domain discovery with near-optimal guarantees for missing mass on Zipfian data.
AI safety, alignment, jailbreaks, adversarial robustness, privacy, differential privacy, and membership inference.
WGM-based methods provide efficient domain discovery with near-optimal guarantees for missing mass on Zipfian data.
EigenBench measures language model value alignment using model ensemble judgments aggregated with EigenTrust without ground truth labels.
Ellipse signatures function as forgery-resistant model output identifiers based on high-dimensional geometric constraints.
Analyzes machine unlearning in high dimensions showing single noisy Newton step with Gaussian noise suffices for privacy-accuracy.
Releases Hubble suite of open-source LLMs with controlled perturbed variants to systematically study memorization risks.
Introduces semantically conditioned watermarks for robust and stealthy LLM fingerprinting robust to deployment scenarios.
Omni-Reward addresses modality imbalance and preference rigidity with omni-modal reward modeling framework.
PATEGAIL++ privacy-preserving trajectory generation framework using sensitivity-aware noise allocation for improved privacy-utility trade-off.
Introduces RedTeamCUA framework with hybrid web-OS sandbox for adversarial testing of computer-use agents.
MetamerGen generates scene metamers aligned with human perception using foveal/peripheral features and latent diffusion.
Watermarks diffusion models losslessly via spherical mapping preserving Gaussian prior up to third-order moments.
Framework studying strategic control of social learning by algorithmic information mediators with theoretical analysis and LLM-based simulations.
Proposes CorreGen, generative framework for multi-view clustering under noisy correspondence using EM algorithm.
WIMHF uses sparse autoencoders to extract human-interpretable features from preference data, enabling better understanding and curation of human feedback.