Abstract
We consider the problem of learning an effective strategy online in a hidden information game against an opponent with a changing strategy. We want to model and exploit the opponent and make three proposals to do this; firstly, to infer its hidden information using an expectation-maximisation algorithm, secondly, to predict its actions using a sequence prediction method, and finally, to simulate games between our agent and our opponent model in-between games against the opponent. Our approach does not require knowledge outside the rules of the game, and does not assume that the opponent’s strategy is stationary. Experiments in simplified poker games show that it increases the average payoff per game of a state-of-the-art no-regret learning algorithm.
Original language | English |
---|---|
Pages (from-to) | 11-24 |
Number of pages | 14 |
Journal | IEEE Transactions on Computational Intelligence and AI in Games |
Volume | 9 |
Issue number | 1 |
Early online date | Oct 2015 |
DOIs | |
Publication status | Published - 1 Mar 2017 |
Keywords
- Bayes methods
- Computational modeling
- Games
- Nash equilibrium
- Predictive models
- Expectation-Maximization
- Counterfactual regret minimisation