AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference

Jing Yao, Shitong Duan, Xiaoyuan Yi, Dongkuan Xu, Peng Zhang, Tun Lu, Ning Gu, Zhicheng Dou, Xing Xie

LLMs & Reasoning Thu, Apr 23 · 4:03 PM–4:13 PM · 203 A/B Avg rating: 7.00 (4–8)

Author-provided TL;DR

This paper proposes aa novel dynamic and automated evaluation framework to probe LLMs' value orientations and value differences

Abstract

Assessing Large Language Models’ (LLMs) underlying value differences enables comprehensive comparison of their misalignment, cultural adaptability, and biases. Nevertheless, current value measurement methods face the informativeness challenge: with often outdated, contaminated, or generic test questions, they can only capture the orientations on comment safety values, e.g., HHH, shared among different LLMs, leading to indistinguishable and uninformative results. To address this problem, we introduce AdAEM, a novel, self-extensible evaluation algorithm for revealing LLMs’ inclinations. Distinct from static benchmarks, AdAEM automatically and adaptively generates and extends its test questions. This is achieved by probing the internal value boundaries of a diverse set of LLMs developed across cultures and time periods in an in-context optimization manner. Such a process theoretically maximizes an information-theoretic objective to extract diverse controversial topics that can provide more distinguishable and informative insights about models’ value differences. In this way, AdAEM is able to co-evolve with the development of LLMs, consistently tracking their value dynamics. We use AdAEM to generate novel questions and conduct an extensive analysis, demonstrating our method’s validity and effectiveness, laying the groundwork for better interdisciplinary research on LLMs’ values and alignment. Codes and the generated evaluation questions are released at https://github.com/ValueCompass/AdAEM.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

AdAEM dynamically generates value-assessment questions for LLMs by probing internal value boundaries using in-context optimization.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces self-extensible evaluation framework AdAEM that automatically generates diverse controversial topics to maximize information-theoretic objective
Framework co-evolves with LLM development by probing internal value boundaries of diverse LLMs across cultures and time periods
Addresses informativeness challenge of static benchmarks with outdated or contaminated test questions

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

In-context optimization
Information-theoretic objective
Schwartz's Value Theory

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

Touche23-ValueEval
ValueBench

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Relies exclusively on Schwartz's Value Theory which may introduce biases and miss alternative value dimensions from other theories
from the paper
Dataset limited to English-speaking contexts; future work needs multiple languages and cultural perspectives
from the paper
Methods could be misused to exploit controversial topics in ways that harm LLMs or society
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Integrate multiple value theories or adopt comparative approach for more holistic understanding of human values
from the paper
Incorporate multiple languages and cultural perspectives in value evaluation
from the paper
Collect harmful questions generated by AdAEM to fine-tune better guardrail models
from the paper

Author keywords

LLM Evaluation
Value Evaluation
Value Alignment
Dynamic Evaluation
Value Difference

Something off? Let us know →

AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis