AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference
Jing Yao, Shitong Duan, Xiaoyuan Yi, Dongkuan Xu, Peng Zhang, Tun Lu, Ning Gu, Zhicheng Dou, Xing Xie
This paper proposes aa novel dynamic and automated evaluation framework to probe LLMs' value orientations and value differences
Abstract
Assessing Large Language Models’ (LLMs) underlying value differences enables comprehensive comparison of their misalignment, cultural adaptability, and biases. Nevertheless, current value measurement methods face the informativeness challenge: with often outdated, contaminated, or generic test questions, they can only capture the orientations on comment safety values, e.g., HHH, shared among different LLMs, leading to indistinguishable and uninformative results. To address this problem, we introduce AdAEM, a novel, self-extensible evaluation algorithm for revealing LLMs’ inclinations. Distinct from static benchmarks, AdAEM automatically and adaptively generates and extends its test questions. This is achieved by probing the internal value boundaries of a diverse set of LLMs developed across cultures and time periods in an in-context optimization manner. Such a process theoretically maximizes an information-theoretic objective to extract diverse controversial topics that can provide more distinguishable and informative insights about models’ value differences. In this way, AdAEM is able to co-evolve with the development of LLMs, consistently tracking their value dynamics. We use AdAEM to generate novel questions and conduct an extensive analysis, demonstrating our method’s validity and effectiveness, laying the groundwork for better interdisciplinary research on LLMs’ values and alignment. Codes and the generated evaluation questions are released at https://github.com/ValueCompass/AdAEM.
AdAEM dynamically generates value-assessment questions for LLMs by probing internal value boundaries using in-context optimization.
- Introduces self-extensible evaluation framework AdAEM that automatically generates diverse controversial topics to maximize information-theoretic objective
- Framework co-evolves with LLM development by probing internal value boundaries of diverse LLMs across cultures and time periods
- Addresses informativeness challenge of static benchmarks with outdated or contaminated test questions
- In-context optimization
- Information-theoretic objective
- Schwartz's Value Theory
- Touche23-ValueEval
- ValueBench
Relies exclusively on Schwartz's Value Theory which may introduce biases and miss alternative value dimensions from other theories
from the paperDataset limited to English-speaking contexts; future work needs multiple languages and cultural perspectives
from the paperMethods could be misused to exploit controversial topics in ways that harm LLMs or society
from the paper
Integrate multiple value theories or adopt comparative approach for more holistic understanding of human values
from the paperIncorporate multiple languages and cultural perspectives in value evaluation
from the paperCollect harmful questions generated by AdAEM to fine-tune better guardrail models
from the paper
Author keywords
- LLM Evaluation
- Value Evaluation
- Value Alignment
- Dynamic Evaluation
- Value Difference
Related orals
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Benchmarks practical privacy risks in differential privacy-adapted LLMs, revealing distribution shifts and model choice impact effectiveness.
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Demonstrates LLMs can be finetuned to generate harmful steganographically-hidden outputs while appearing benign to safety systems.
Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents
Proposes T3 algorithm to detect belief deviation in LLM agents and truncate trajectories for improved reinforcement learning in active reasoning tasks.
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
RefineStat enforces semantic constraints and applies diagnostic-aware refinement for synthesizing valid probabilistic programs from smaller language models.