Intrinsic Entropy of Context Length Scaling in LLMs

Jingzhe Shi, Qinwei Ma, Hongyi Liu, Hang Zhao, Jenq-Neng Hwang, Lei Li

LLMs & Reasoning Thu, Apr 23 · 3:51 PM–4:01 PM · 201 A/B Avg rating: 5.50 (2–10)

Author-provided TL;DR

We propose to use Intrinsic Entropy for understanding impact of context length on Language Modeling, and conduct experiments to validate theoretical assumptions and deductions with language and synthetic datasets.

Abstract

Long Context Language Models have drawn great attention in the past few years. There has been work discussing the impact of long context on Language Model performance: some find that long irrelevant context could harm performance, while some experimentally summarize loss reduction by relevant long context as Scaling Laws. This calls for a more thorough understanding of how long context impacts Language Modeling. In this work, we (1) propose to use `Intrinsic Entropy' for explaining the impact of context length on language modeling; and (2) conduct experiments on natural language and synthetic data, validating our proposed theoretical assumptions and deductions. Our theoretical framework can provide practical insights such as establishing that training dataset size dictates an optimal context length and bounds context length scaling for certain cases. We hope our work may inspire new long context Language Models, as well as future work studying the physics of Language Models.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Theory of context length scaling through Intrinsic Entropy explaining optimal context length and training dataset size relationship.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Intrinsic Entropy framework for explaining impact of context length on language modeling
Theoretical analysis establishing linear relation between cross-entropy loss and Intrinsic Entropy
Finding that optimal context length exists and increases with dataset size in pretraining

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Intrinsic Entropy analysis
Bayes Risk framework
Approximation Loss analysis

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Theory starting from Intrinsic Entropy only holds with specific assumptions
from the paper
Explanation relies on Intrinsic Space perspective which depends on data, neural network, and prediction task
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Propose more fundamental theories to explain Intrinsic Entropy measurements
from the paper

Author keywords

context length
intrinsic entropy

Something off? Let us know →

Intrinsic Entropy of Context Length Scaling in LLMs

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis