ICLR 2026 Orals

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, Hao Zhou

Reinforcement Learning & Agents Thu, Apr 23 · 10:54 AM–11:04 AM · Amphitheater Avg rating: 6.50 (4–8)
Author-provided TL;DR

We propose MemAgent, a novel agent workflow for long-text processing, demonstrating exceptional extrapolation and performance in large-scale tasks after RL Training.

Abstract

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents without performance degradation during extrapolation remains the ultimate challenge in long-text processing. To solve this problem, We introduce a novel agent workflow, \method, which processes text in segments and updates memory through an overwrite strategy, addressing the challenge of long-context task through enhanced memory management. We further extend the DAPO algorithm to directly optimize memory ability in an end-to-end fashion, facilitating training via independent-context multi-conversation generation. Experimental results demonstrate that MemAgent has superb long-context capabilities, being able to extrapolate from an 8K context to a 3.5M QA task with a performance loss of less than 10\% and achieving over 95\% on the 512K NIAH test.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

MemAgent uses RL-trained memory modules to enable LLMs to extrapolate from 8K to 3.5M token contexts with minimal performance degradation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Introduces MEMAGEN, an RL-trained memory agent for long-context processing that selectively records relevant information
  • Extends DAPO algorithm for end-to-end optimization of memory ability via independent-context multi-conversation generation
  • Demonstrates extrapolation from 8K training context to 3.5M tokens with less than 10% performance loss
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Reinforcement learning
  • Memory agents
  • DAPO algorithm
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • NIAH (Needle-in-a-Haystack) test
  • LongBench
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Develop more advanced memory architectures and training strategies for enhancing long-context capabilities of LLMs
    from the paper

Author keywords

  • LLM
  • memory
  • agent
  • RLVR

Related orals

Something off? Let us know →