courses:xep36agt:assignments [CourseWare Wiki]

Class projects

No regret learning

In the lecture, we will show that the empirical frequencies of actions of two no regret learning algorithms (in adversarial setting) playing each other in a finite zero-sum game converge to a Nash equilibrium of the game at rate O(1/√T). There are three related questions you can investigate in the assignment.

1) Does this hold also in infinite games? Under what conditions?

2) This statement does not generally hold for algorithms that guarantee no regret only in stochastic (non-adversarial) settings. Find an illustrative counterexample for a randomized version of UCT and investigate sufficient conditions for its convergence.

3) Regret matching+ seems to empirically converge much faster than the bound in self-play in zero-sum games. Find an example game and show that the convergence rate is indeed ω(1/T). Investigate sufficient and necessary conditions for faster convergence.