Reinforcement Learning and online learning

Online learning, multi-armed bandits

How to design algorithms with regret bounds to balance between exploration and exploitation?

How the domain specific constraints and knowledge impact the algorithm design?

Restless multi-armed bandits

RMABs generalize multi-armed bandits (MABs) to allow non-i.i.d. reward distributions that depend on the time-varying states. RMABs can be effectively used to model healthcare and public health problems.

Reinforcement learning

How to design sample and compute efficient offline RL and online RL algorithms for practical applications?

What are the theory and techniques needed behind the algorithms?