Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Yueqi Song, Ketan Ramaneti, Zaid Sheikh, Ziru Chen, Boyu Gou, Tianbao Xie, Yiheng Xu, Danyang Zhang, Apurva Gandhi, Fan Yang, Joseph Liu, Tianyue Ou, Zhihao Yuan, Frank F. Xu, Shuyan Zhou, Xingyao Wang, Xiang Yue, Tao Yu, Huan Sun, Yu Su, Graham Neubig

LLMs & Reasoning Fri, Apr 24 · 3:51 PM–4:01 PM · 203 A/B Avg rating: 6.50 (4–8)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Author-provided TL;DR

We propose Agent Data Protocol (ADP), a lightweight "interlingua" schema that standardizes heterogeneous agent trajectories so datasets can plug into multiple agent SFT pipelines without per-dataset engineering.

Abstract

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. To this end, we introduce the Agent Data Protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets in diverse formats and unified agent training pipelines downstream. The design of ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic workflows, while remaining simple to parse and train on without engineering at a per-dataset level. In experiments, we unified a broad collection of 13 existing agent training datasets into ADP format, and converted the standardized ADP data into training-ready formats for multiple agent frameworks. We performed supervised finetuning on the unified data, and demonstrated an average performance gain of $\sim$20\% over corresponding base models, and delivers state-of-the-art or near-SOTA performance on standard coding, browsing, tool use, and research benchmarks, without domain-specific tuning. All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

ADP lightweight protocol unifies 13 heterogeneous agent datasets into single training schema achieving 20% average performance gain over base models.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Agent Data Protocol as lightweight interlingua for diverse agent dataset formats and training pipelines
Unified schema capturing API use, browsing, coding, software engineering, and agentic workflows
Standardized training achieving SOTA or near-SOTA on coding, browsing, tool use, and research benchmarks

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

dataset standardization
protocol design
supervised fine-tuning

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

13 agent training datasets

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Extend ADP beyond text to images, screen recordings, and multimodal data
from the paper
Standardize evaluation and environments for cleaner composition
from the paper
Strengthen automated validation and dataset conversion for sustained scaling
from the paper

Author keywords

agent
training
data
standardization

Something off? Let us know →

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis