Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. ...
We study calibeating, the problem of post-processing external forecasts online to minimize cumulative losses and match an informativeness-based benchmark. Unlike prior work, which analyzed calibeating...
The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deplo...
This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in $m$-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficien...