Speculative Actions: A Lossless Framework for Faster AI Agents

Naimeng Ye, Arnav Ahuja, Georgios Liargkovas, Yunan Lu, Kostis Kaffes, Tianyi Peng

Efficiency, Systems & Kernels Fri, Apr 24 · 11:42 AM–11:52 AM · Amphitheater Avg rating: 7.50 (6–10)

Author-provided TL;DR

We introduce speculative actions—a lossless framework that predicts likely actions using faster models, enabling multiple API calls to be executed in parallel and thus yields substantial acceleration.

Abstract

AI agents are increasingly deployed in complex, interactive environments, yet their runtime remains a major bottleneck for training, evaluation, and real-world use. Typical agent behavior unfolds sequentially, where each action requires an API call that can incur substantial latency. For example, a game of chess between two state-of-the-art agents can take hours. We introduce speculative actions, a lossless acceleration framework for general agentic systems. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, our method uses faster models to predict likely future actions and executes them in parallel, committing only when predictions match. We evaluate speculative actions across gaming, e-commerce, and web search environments, and additionally study a lossy extension in an operating systems setting. Across domains, we achieve up to 55% next-action prediction accuracy, translating into substantial latency reductions. Finally, we present a cost–latency analysis that formalizes the tradeoff between speculative breadth and time savings. This analysis enables principled tuning and selective branch launching, to ensure multi-branch speculation delivers practical speedups without prohibitive cost growth.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Speculative Actions accelerates agent systems by predicting and executing likely future actions in parallel.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Lossless framework breaking sequential interaction loops through prediction and parallelization
Treats every step (LLM call, tool, MCP request) as API subject to prediction and parallelization
Up to 55% next-action prediction accuracy translating to substantial latency reductions
Cost-latency analysis formalizing tradeoffs between speculative breadth and time savings

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Speculative execution
Action prediction
Parallelization
Multi-branch speculation

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

AI Agents
Speculative Decoding
Parallel Execution
Agentic Serving
Agentic Simulation

Something off? Let us know →

Speculative Actions: A Lossless Framework for Faster AI Agents

Abstract

Author keywords

Related orals

TileLang: Bridge Programmability and Performance in Modern Neural Kernels

Probabilistic Kernel Function for Fast Angle Testing

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Efficient Resource-Constrained Training of Transformers via Subspace Optimization

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention