ICLR 2026 Orals

AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference

Jing Yao, Shitong Duan, Xiaoyuan Yi, Dongkuan Xu, Peng Zhang, Tun Lu, Ning Gu, Zhicheng Dou, Xing Xie

LLMs & Reasoning Thu, Apr 23 · 4:03 PM–4:13 PM · 203 A/B Avg rating: 7.00 (4–8)
Author-provided TL;DR

This paper proposes aa novel dynamic and automated evaluation framework to probe LLMs' value orientations and value differences

Abstract

Assessing Large Language Models’ (LLMs) underlying value differences enables comprehensive comparison of their misalignment, cultural adaptability, and biases. Nevertheless, current value measurement methods face the informativeness challenge: with often outdated, contaminated, or generic test questions, they can only capture the orientations on comment safety values, e.g., HHH, shared among different LLMs, leading to indistinguishable and uninformative results. To address this problem, we introduce AdAEM, a novel, self-extensible evaluation algorithm for revealing LLMs’ inclinations. Distinct from static benchmarks, AdAEM automatically and adaptively generates and extends its test questions. This is achieved by probing the internal value boundaries of a diverse set of LLMs developed across cultures and time periods in an in-context optimization manner. Such a process theoretically maximizes an information-theoretic objective to extract diverse controversial topics that can provide more distinguishable and informative insights about models’ value differences. In this way, AdAEM is able to co-evolve with the development of LLMs, consistently tracking their value dynamics. We use AdAEM to generate novel questions and conduct an extensive analysis, demonstrating our method’s validity and effectiveness, laying the groundwork for better interdisciplinary research on LLMs’ values and alignment. Codes and the generated evaluation questions are released at https://github.com/ValueCompass/AdAEM.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

AdAEM dynamically generates value-assessment questions for LLMs by probing internal value boundaries using in-context optimization.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Introduces self-extensible evaluation framework AdAEM that automatically generates diverse controversial topics to maximize information-theoretic objective
  • Framework co-evolves with LLM development by probing internal value boundaries of diverse LLMs across cultures and time periods
  • Addresses informativeness challenge of static benchmarks with outdated or contaminated test questions
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • In-context optimization
  • Information-theoretic objective
  • Schwartz's Value Theory
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Touche23-ValueEval
  • ValueBench
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Relies exclusively on Schwartz's Value Theory which may introduce biases and miss alternative value dimensions from other theories
    from the paper
  • Dataset limited to English-speaking contexts; future work needs multiple languages and cultural perspectives
    from the paper
  • Methods could be misused to exploit controversial topics in ways that harm LLMs or society
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Integrate multiple value theories or adopt comparative approach for more holistic understanding of human values
    from the paper
  • Incorporate multiple languages and cultural perspectives in value evaluation
    from the paper
  • Collect harmful questions generated by AdAEM to fine-tune better guardrail models
    from the paper

Author keywords

  • LLM Evaluation
  • Value Evaluation
  • Value Alignment
  • Dynamic Evaluation
  • Value Difference

Related orals

Something off? Let us know →