In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, Pan Lu

LLMs & Reasoning Fri, Apr 24 · 10:54 AM–11:04 AM · Amphitheater Avg rating: 7.33 (6–8)

Author-provided TL;DR

We introduce AgentFlow, a trainable agentic system, and Flow-GRPO, an on-policy RL algorithm that optimizes the planner "in-the-flow" by broadcasting a final outcome reward to all steps, enabling effective long-horizon planning and tool use.

Abstract

Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this scales poorly with long horizons and diverse tools and generalizes weakly to new scenarios. Agentic systems offer a promising alternative by decomposing work across specialized modules, yet most remain training-free or rely on offline training decoupled from the live dynamics of multi-turn interaction. We introduce AgentFlow, a trainable, *in-the-flow* agentic framework that coordinates four modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. To train on-policy in live environments, we propose *Flow-based Group Refined Policy Optimization* (Flow-GRPO), which tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. Across ten benchmarks, AgentFlow with a 7B-scale backbone outperforms top-performing baselines with average accuracy gains of 14.9% on search, 14.0% on agentic, 14.5% on mathematical, and 4.1% on scientific tasks, even surpassing larger proprietary models like GPT-4o. Further analyses confirm the benefits of in-the-flow optimization, showing improved planning, enhanced tool-calling reliability, and positive scaling with model size and reasoning turns.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

AgentFlow trainable in-the-flow agentic system using Flow-GRPO for on-policy learning with long-horizon sparse rewards.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Modular agentic framework with in-the-flow planning optimization inside multi-turn loop
Flow-based Group Refined Policy Optimization converting multi-turn RL to tractable single-turn policy updates
Demonstration of improved planning, tool-calling reliability, and positive scaling with model size

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

reinforcement learning
group normalized advantages
policy optimization

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

search benchmarks
agentic benchmarks
mathematical benchmarks
scientific benchmarks

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Tools limited to general information search with less domain-specific information
from the paper
Not suitable for video search or analysis on YouTube
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

Reinforcement Learning
Large Language Models
Agentic Systems
Tool Use
Planning
On-policy Optimization
Sparse Rewards

Something off? Let us know →

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis