Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Zhiyu Mou, Yiqin Lv, Miao Xu, Cheems Wang, Yixiu Mao, Jinghao Chen, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng

Reinforcement Learning & Agents Sat, Apr 25 · 4:15 PM–4:25 PM · Amphitheater Avg rating: 6.00 (4–8)

OpenReview ↗ arXiv ↗ PDF ↗ iclr.cc ↗

Abstract

Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static dataset with feedback. To address this, we propose AIGB-Pearl (Planning with EvaluAtor via RL), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator to assess the quality of generated scores and designing a provably sound KL-Lipschitz-constrained score-maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm that incorporates the synchronous coupling technique is further developed to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

AIGB-Pearl enhances generative auto-bidding with trajectory evaluator and KL-Lipschitz-constrained optimization for safe exploration beyond offline data.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Trajectory evaluator for assessing quality of generated bidding scores
KL-Lipschitz-constrained score-maximization ensuring safe exploration within certified neighborhood
Synchronous coupling technique maintaining model regularity for practical implementation

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Auto-bidding
Generative planning
Reinforcement learning
Offline RL
Advertising

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

auto-bidding
offline reinforcement learning
generative decision making

Something off? Let us know →

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Abstract

Author keywords

Related orals

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport