AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL

Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Jiaqi Liu, Honglin Guo, yajie yang, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang

Reinforcement Learning & Agents Fri, Apr 24 · 11:18 AM–11:28 AM · Amphitheater Avg rating: 7.00 (6–10)

OpenReview ↗ PDF ↗ iclr.cc ↗

Author-provided TL;DR

We present AgentGym-RL, a unified open-source framework for training LLM agents from scratch across diverse and realistic environments, and propose ScalingInter-RL, a staged training strategy for stable long-horizon RL training.

Abstract

Training LLM agents for complex multi-turn decision-making tasks requires extensive exploration within their environment, with reinforcement learning (RL) as a natural way. However, the open-source community currently lacks a unified RL framework capable of training agents from scratch across diverse and realistic environments. To bridge this gap, we introduce AgentGym-RL, a modular and decoupled framework specifically designed for RL-based agent in multi-turn decision-making tasks. It offers high flexibility and extensibility, supports mainstream RL algorithms, and spans a broad range of real-world scenarios. To effectively train agents for challenging tasks, we argue that they are required to expand external interactions with the environment, rather than relying solely on internal reasoning. Nevertheless, training agents for long-horizon interaction with vanilla methods often faces challenges like training instability. To this end, we propose ScalingInter-RL, a staged training approach for stable long-horizon RL training. It starts with short-horizon interaction to establish foundational policies and progressively expands them to encourage deeper exploration. Extensive experiments show that agents trained with our method achieve performance on par with—or even surpass—commercial counterparts like OpenAI o3 and Gemini-2.5-Pro across 27 tasks in diverse environments. We share key insights and release the full framework, including code and datasets, to empower the community in building the next generation of intelligent agents. Our framework is available at https://github.com/WooooDyy/AgentGym-RL.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Presents unified RL framework for training LLM agents on long-horizon decision-making with staged interaction scaling.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Develops modular and decoupled RL framework supporting mainstream algorithms across diverse environments
Proposes ScalingInter-RL staged training approach starting short-horizon to progressively expand interactions
Demonstrates agents trained with method achieve performance on par with OpenAI o3 and Gemini-2.5-Pro
Releases full framework including code and datasets to community

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Reinforcement learning
Multi-turn decision-making

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

large language model
LLM-based agent
decision-making

Something off? Let us know →

AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL

Abstract

Author keywords

Related orals

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport