RL and online learning

Reinforcement Learning and online learning

Online learning, multi-armed bandits

How to design algorithms with regret bounds to balance between exploration and exploitation?

How the domain specific constraints and knowledge impact the algorithm design?

Optimistic Whittle Index Policy: Online Learning for Restless Bandits
Kai Wang*, Lily Xu*, Aparna Taneja, Milind Tambe (AAAI 2023)
Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, and Milind Tambe (AAAI 2021 best paper runner up)

Restless multi-armed bandits

RMABs generalize multi-armed bandits (MABs) to allow non-i.i.d. reward distributions that depend on the time-varying states. RMABs can be effectively used to model healthcare and public health problems.

Networked Restless Multi-Arm Bandits with Reinforcement Learning
Hanmo Zhang, Kai Wang (PRL workshop AAAI 2025, in progress new!!)

Reinforcement learning

How to design sample and compute efficient offline RL and online RL algorithms for practical applications?

What are the theory and techniques needed behind the algorithms?

Soft Diffusion Actor-Critic: Efficient Online Reinforcement Learning for Diffusion Policy
Haitong Ma, Tianyi Chen, Kai Wang, Li Na*, Bo Dai* (in submission, new!!)

Primal-Dual Spectral Representation for Off-policy Evaluation
Yang Hu, Tianyi Chen, Na Li, Kai Wang, Bo Dai (AISTATS 2025)