Non-Asymptotic Analysis of (Sticky) Track-and-Stop

Riccardo Poiani, Martino Bernasconi, Andrea Celli

Reinforcement Learning & Agents Sat, Apr 25 · 3:39 PM–3:49 PM · Amphitheater Avg rating: 6.00 (4–8)

Author-provided TL;DR

We derive non-asymptotic guarantees for the Track-and-Stop and Sticky Track-and-Stop algorithms.

Abstract

In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $\delta$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $\delta \to 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $\epsilon$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Provides first finite-confidence analysis of Track-and-Stop and Sticky Track-and-Stop algorithms for pure exploration problems.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

First finite-confidence characterization of Track-and-Stop algorithm performance
Finite-confidence guarantees for Sticky Track-and-Stop in multiple-answer settings
Recovers asymptotic optimality and provides theoretical support for practical performance

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Pure exploration
Multi-armed bandits
Best-arm identification
Information theory

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Remove problem constant from finite-confidence analysis of Sticky Track-and-Stop through modified sampling rules
from the paper
Close gap between lower and upper bounds in finite-confidence regime
from the paper
Extend to infinite answer problems with asymptotic optimality guarantees
from the paper
Develop tight dependencies in log(1/delta) and other instance parameters
from the paper

Author keywords

Multi-Armed Bandit Theory
Pure Exploration
Fixed-Confidence

Something off? Let us know →

Non-Asymptotic Analysis of (Sticky) Track-and-Stop

Abstract

Author keywords

Related orals

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport