
Prof. Shipra Agrawal
Columbia University
Talk:
Online revenue management under global convex constraints and objective
Abstract:
We consider a very general model for managing the exploration–exploitation trade-off, which allows global convex constraints and concave objective on the aggregate decisions over time in addition to the customary limitation on the time horizon. This model provides a natural framework to study many sequential decision-making problems with long-term convex constraints and concave utility and subsumes the classic multiarmed bandit (MAB) model and the bandits with knapsacks problem as special cases. We demonstrate that a natural extension of the upper confidence bound family of algorithms for MAB provides a polynomial time algorithm with near-optimal regret guarantees for this substantially more general model. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well-studied problems/algorithms, such as the Blackwell approachability problem, online convex optimization, and the Frank–Wolfe technique for convex optimization. We give several concrete examples of applications, particularly in risk-sensitive revenue management under unknown demand distributions, in which this more general bandit model of sequential decision making allows for richer formulations and more efficient solutions of the problem.