ICLR 2026 Orals

Intrinsic Entropy of Context Length Scaling in LLMs

Jingzhe Shi, Qinwei Ma, Hongyi Liu, Hang Zhao, Jenq-Neng Hwang, Lei Li

LLMs & Reasoning Thu, Apr 23 · 3:51 PM–4:01 PM · 201 A/B Avg rating: 5.50 (2–10)
Author-provided TL;DR

We propose to use Intrinsic Entropy for understanding impact of context length on Language Modeling, and conduct experiments to validate theoretical assumptions and deductions with language and synthetic datasets.

Abstract

Long Context Language Models have drawn great attention in the past few years. There has been work discussing the impact of long context on Language Model performance: some find that long irrelevant context could harm performance, while some experimentally summarize loss reduction by relevant long context as Scaling Laws. This calls for a more thorough understanding of how long context impacts Language Modeling. In this work, we (1) propose to use `Intrinsic Entropy' for explaining the impact of context length on language modeling; and (2) conduct experiments on natural language and synthetic data, validating our proposed theoretical assumptions and deductions. Our theoretical framework can provide practical insights such as establishing that training dataset size dictates an optimal context length and bounds context length scaling for certain cases. We hope our work may inspire new long context Language Models, as well as future work studying the physics of Language Models.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Theory of context length scaling through Intrinsic Entropy explaining optimal context length and training dataset size relationship.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Intrinsic Entropy framework for explaining impact of context length on language modeling
  • Theoretical analysis establishing linear relation between cross-entropy loss and Intrinsic Entropy
  • Finding that optimal context length exists and increases with dataset size in pretraining
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Intrinsic Entropy analysis
  • Bayes Risk framework
  • Approximation Loss analysis
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Theory starting from Intrinsic Entropy only holds with specific assumptions
    from the paper
  • Explanation relies on Intrinsic Space perspective which depends on data, neural network, and prediction task
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Propose more fundamental theories to explain Intrinsic Entropy measurements
    from the paper

Author keywords

  • context length
  • intrinsic entropy

Related orals

Something off? Let us know →