MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, Hao Zhou

Reinforcement Learning & Agents Thu, Apr 23 · 10:54 AM–11:04 AM · Amphitheater Avg rating: 6.50 (4–8)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Author-provided TL;DR

We propose MemAgent, a novel agent workflow for long-text processing, demonstrating exceptional extrapolation and performance in large-scale tasks after RL Training.

Abstract

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents without performance degradation during extrapolation remains the ultimate challenge in long-text processing. To solve this problem, We introduce a novel agent workflow, \method, which processes text in segments and updates memory through an overwrite strategy, addressing the challenge of long-context task through enhanced memory management. We further extend the DAPO algorithm to directly optimize memory ability in an end-to-end fashion, facilitating training via independent-context multi-conversation generation. Experimental results demonstrate that MemAgent has superb long-context capabilities, being able to extrapolate from an 8K context to a 3.5M QA task with a performance loss of less than 10\% and achieving over 95\% on the 512K NIAH test.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

MemAgent uses RL-trained memory modules to enable LLMs to extrapolate from 8K to 3.5M token contexts with minimal performance degradation.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces MEMAGEN, an RL-trained memory agent for long-context processing that selectively records relevant information
Extends DAPO algorithm for end-to-end optimization of memory ability via independent-context multi-conversation generation
Demonstrates extrapolation from 8K training context to 3.5M tokens with less than 10% performance loss

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Reinforcement learning
Memory agents
DAPO algorithm

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

NIAH (Needle-in-a-Haystack) test
LongBench

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Develop more advanced memory architectures and training strategies for enhancing long-context capabilities of LLMs
from the paper

Author keywords

LLM
memory
agent
RLVR

Something off? Let us know →

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Abstract

Author keywords

Related orals

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport

Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training