Abstract
Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS). We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective.
Original language | English |
---|---|
Title of host publication | Workshop and Conference Proceedings Volume 31 |
Editors | Carlos M Carvalho, Pradeep Ravikumar |
Place of Publication | Journal of Machine Learning Research (Online) |
Publisher | Microtome Publishing |
Pages | 442-450 |
Number of pages | 9 |
Publication status | Published - 29 Apr 2013 |
Event | Artificial Intelligence and Statistics (AISTATS) 2013 - Scottsdale, Ar. USA Duration: 29 Apr 2013 → 1 May 2013 http://www.jmlr.org/proceedings/papers/v31/mellor13a.html |
Publication series
Name | Journal of Machine Learning Research Workshop and Conference Proceedings |
---|
Conference
Conference | Artificial Intelligence and Statistics (AISTATS) 2013 |
---|---|
City | Scottsdale, Ar. USA |
Period | 29/04/13 → 1/05/13 |
Internet address |