Any capacity-limited observer induces a canonical quotient on its environment: two situations that no bounded agent can distinguish are, for that agent, the same. We formalise this for finite POMDPs. ...
Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms sufferin...
We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundati...