ICLR 2026 Orals

MrRoPE: Mixed-radix Rotary Position Embedding

Qingyuan Tian, Wenhong Zhu, Xiaoran Liu, Xiaofeng Wang, Rui Wang

Uncategorized Fri, Apr 24 · 3:27 PM–3:37 PM · Amphitheater Avg rating: 6.50 (6–8)
Author-provided TL;DR

We present a unified theory MrRoPE linking major RoPE-extension methods to radix conversion. Based on this, we propose MrRoPE-Pro, a training-free context window extension method..

Abstract

Rotary Position Embedding (RoPE)-extension refers to modifying or generalizing the Rotary Position Embedding scheme to handle longer sequences than those encountered during pre-training. However, current extension strategies are highly diverse and lack a unified theoretical foundation. In this paper, we propose $\textbf{\textit{MrRoPE (Mixed-radix RoPE)}}$, a generalized encoding formulation based on a radix system conversion perspective, which elegantly unifies various RoPE-extension approaches as distinct radix conversion strategies. Based on this theory, we introduce two training-free extensions, $\textbf{\textit{MrRoPE-Uni}}$ and $\textbf{\textit{MrRoPE-Pro}}$, which leverage uniform and progressive radix conversion strategies, respectively, to achieve “train short, test long” generalization. Without fine-tuning, MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN’s accuracy on Infinite-Bench retrieval and dialogue subsets. Theoretical analysis confirms that MrRoPE-Pro effectively raises the upper bound of RoPE's attainable encoding length, which further validates the reliability and utility of our theory and methodology.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

MrRoPE generalizes RoPE-extension via radix system conversion, achieving train-short-test-long with doubled effective context window.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Proposes unified theory MrRoPE linking major RoPE-extension methods as distinct radix conversion strategies
  • Introduces MrRoPE-Uni and MrRoPE-Pro training-free extensions using uniform and progressive radix conversion
  • Achieves over 85% recall in 128K-context Needle-in-Haystack and double YaRN accuracy on Infinite-Bench
  • Theoretical analysis confirms MrRoPE-Pro raises upper bound of RoPE's attainable encoding length
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Positional embeddings
  • Rotary position encoding
  • Radix system conversion
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Needle-in-Haystack
  • Infinite-Bench
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Primarily focuses on training-free extension; absence of fine-tuning experiments limits comparisons with methods like xPOS and LongRoPE
    from the paper
  • Theoretical framework tied to RoPE mechanism; generalizability to other positional encoding schemes remains open
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Incorporate fine-tuning experiments for more comprehensive demonstration of superiority and integration potential
    from the paper

Author keywords

  • transformers
  • nlp
  • llms
  • context window extension
  • attention
  • rotary embedding

Related orals

Information Shapes Koopman Representation

Proposes information-theoretic Lagrangian formulation to balance simplicity and expressiveness in Koopman representation learning for dynamical systems.

Avg rating: 5.50 (4–6) · Xiaoyuan Cheng et al.
Something off? Let us know →