Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Riccardo Poiani, Martino Bernasconi, Andrea Celli
We derive non-asymptotic guarantees for the Track-and-Stop and Sticky Track-and-Stop algorithms.
Abstract
In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $\delta$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $\delta \to 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $\epsilon$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.
Provides first finite-confidence analysis of Track-and-Stop and Sticky Track-and-Stop algorithms for pure exploration problems.
- First finite-confidence characterization of Track-and-Stop algorithm performance
- Finite-confidence guarantees for Sticky Track-and-Stop in multiple-answer settings
- Recovers asymptotic optimality and provides theoretical support for practical performance
- Pure exploration
- Multi-armed bandits
- Best-arm identification
- Information theory
Authors did not state explicit limitations.
Remove problem constant from finite-confidence analysis of Sticky Track-and-Stop through modified sampling rules
from the paperClose gap between lower and upper bounds in finite-confidence regime
from the paperExtend to infinite answer problems with asymptotic optimality guarantees
from the paperDevelop tight dependencies in log(1/delta) and other instance parameters
from the paper
Author keywords
- Multi-Armed Bandit Theory
- Pure Exploration
- Fixed-Confidence
Related orals
Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
SparseRL leverages deep RL and pretrained models to generate high-performance CUDA code for sparse matrix operations.
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
DECS framework reduces reasoning model overthinking by decoupling necessary from redundant tokens via curriculum scheduling.
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
MemAgent uses RL-trained memory modules to enable LLMs to extrapolate from 8K to 3.5M token contexts with minimal performance degradation.
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
DiffusionNFT enables efficient online reinforcement learning for diffusion models via forward process optimization with up to 25x efficiency gains.
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
Hyperparameter Trajectory Inference uses conditional Lagrangian optimal transport to reconstruct neural network outputs across hyperparameter spectra without expensive retraining.