AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
Presents unified RL framework for training LLM agents on long-horizon decision-making with staged interaction scaling.
Reinforcement learning, decision-making, autonomous agents, multi-agent systems, and planning.
Presents unified RL framework for training LLM agents on long-horizon decision-making with staged interaction scaling.
Presents AstaBench, comprehensive benchmark suite with production-grade tools for rigorous evaluation of AI agents on scientific research tasks.
DiffMPC provides GPU-accelerated differentiable MPC solver leveraging problem structure for efficient parallelization.
DiffusionNFT enables efficient online reinforcement learning for diffusion models via forward process optimization with up to 25x efficiency gains.
Proposes Discount Model Search for quality diversity optimization in high-dimensional measure spaces.
AIGB-Pearl enhances generative auto-bidding with trajectory evaluator and KL-Lipschitz-constrained optimization for safe exploration beyond offline data.
Proposes ExDM using diffusion models for exploration and policy learning in unsupervised reinforcement learning.
RNN models of hippocampus reveal how locomotor development statistics shape emergence of spatial neural representations.
Hyperparameter Trajectory Inference uses conditional Lagrangian optimal transport to reconstruct neural network outputs across hyperparameter spectra without expensive retraining.
LPWM enables self-supervised object-centric world modeling with latent action module for stochastic video generation and control.
SparseRL leverages deep RL and pretrained models to generate high-performance CUDA code for sparse matrix operations.
MVP achieves fastest one-step action generation with instantaneous velocity constraint providing high expressiveness for robotic control.
MemAgent uses RL-trained memory modules to enable LLMs to extrapolate from 8K to 3.5M token contexts with minimal performance degradation.
MomaGraph learns unified task-oriented scene representations integrating spatial-functional relationships for embodied agents to perform planning and manipulation.
Provides first finite-confidence analysis of Track-and-Stop and Sticky Track-and-Stop algorithms for pure exploration problems.
OpenApps testbed reveals UI agent reliability varies drastically across app variations despite stable within-environment performance.
OpTI-BFM uses optimistic decision criterion modeling uncertainty over reward functions to enable efficient task inference for behavior foundation models.
DECS framework reduces reasoning model overthinking by decoupling necessary from redundant tokens via curriculum scheduling.
Q-RAG fine-tunes embedders for multi-step retrieval using reinforcement learning, achieving state-of-the-art on long-context QA.
Rodrigues Networks inject kinematics-aware inductive biases for improved action learning in articulated robot tasks.
ABOM performs task-free adaptive meta black-box optimization using online parameter adaptation without predefined task distributions.
Learns zero-shot RL representations via temporal difference latent prediction recovering successor factorization.
Characterizes online learning with ranking feedback showing sublinear regret impossible in general, possible with variation bounds.
TROLL replaces PPO clip objective with differentiable trust region projection for more stable and efficient LLM reward fine-tuning.