Title: Batched Multi-Armed Bandits with Optimal Regret

Authors: Hossein Esfandiari, Amin Karbasi, Abbas Mehrabian, Vahab S. Mirrokni


In this paper, we propose three online algorithms for submodular maximization. The first one, Mono-Frank-Wolfe, reduces the number of per-function gradient evaluations from $T^{1/2}$ [Chen2018Online] and $T^{3/2}$ [chen2018projection] to 1, and achieves a (1−1/e)-regret bound of $O(T^{4/5})$. The second one, Bandit-Frank-Wolfe, is the first bandit algorithm for continuous DR-submodular maximization, which achieves a (1−1/e)-regret bound of $O(T^{8/9})$. Finally, we extend Bandit-Frank-Wolfe to a bandit algorithm for discrete submodular maximization, Responsive-Frank-Wolfe, which attains a (1−1/e)-regret bound of $O(T^{8/9})$ in the responsive bandit setting.

Full Text: [PDF]

Accessibility at Yale   Inference, Information, and Decision Group at Yale