4 papers - avg viability 3.8
Develops online learning algorithms that use ranking feedback instead of numerical utilities, applicable to human-in-the-loop systems and game theory, with demonstrated effectiveness in LLM routing.
A theoretical framework for improving external forecasts through online post-processing to minimize cumulative losses.
A framework for language models to improve continuously from real-world deployment experiences.
This paper presents an efficient algorithm for the $m$-set semi-bandit problem with optimal regret guarantees.