Assignment #2 - Extensive-Form Games

We consider the following problem: your agent is placed on an island with a map. The goal of the agent is to reach the destination square on the island while collecting as many gold treasures as possible while stepping at each square at most once. There are bandits on the island that want to ambush the agent and steal the collected gold. The bandits may be placed at specific squares (termed “dangerous places”, marked as “E” in the map); there can be at most one bandit at one dangerous place. Your agent only knows the overall number of bandits, however, he does not know the exact places where they are waiting. The bandits want to harm your agent as much as possible by deciding at which dangerous places they are going to wait. When your agent visits a place with a bandit, an attack occurs. The attack is successful with some probability p (the agent will lose all the gold and the game ends). Otherwise, the attack was unsuccessful and your agent can continue. Your agent can react to the information gained during the course of the game – e.g., in the case of an unsuccessful attack, the agent can exploit this information (there is one less bandit in the remaining dangerous places).

The bandits primarily decide their position at the beginning of the game. Moreover, the bandits can use sensors. If your agent enters an unoccupied dangerous place, an alarm is triggered and 1 bandit can relocate to another empty dangerous place. However, this can happen at most once. The following holds:

the alarm is triggered only when the agent visits an unoccupied dangerous place for the first time
there is at most 1 alarm during the game,
if the first dangerous place the agent visits is occupied with a bandit, there will be no alarm triggered (now, nor any later in the game)
bandit knows which dangerous place caused the alarm (i.e., the bandits know the actual position of the agent in the time of the alarm)
a bandit can be relocated (does not have to be) and it cannot be relocated to the dangerous place where there is another bandit or the agent

The semestral project is split in two parts:

formalize this problem as a two-player zero-sum extensive-form game, and
specify the sequence-form linear program for this game and solve it using an external LP solver

1. Formalize problem as a an extensive-form game

For the first part, we prepared a template that you should implement:

game_tree.py

The template contains code to export the EFG into a format that can be subsequently parsed and loaded by Gambit (see below). You can check in the visual viewer if the game has been modelled as you aimed.

Automatic evaluation of the constructed trees is non-trivial, and you may receive points for invalid solutions. The score in the upload system therefore serves as only an upper bound of the final score. We will revise your submissions manually, and in case we find some errors, points will be deducted.

2. Specify the sequence-form linear program and solve it

For the second part, we prepared a template that you should implement:

game_lp.py

You can use following LP solvers: cvxopt gurobipy

To setup gurobi locally, there is a great tutorial in the Combinatorial Optimization course: https://cw.fel.cvut.cz/b182/courses/ko/labs

If you want to use other LP solver that is available as Python package, consult with me first. (It has to be added to automatic evaluation system.)

You should consult the manual of your LP solver to figure out how to input your sequence-form LP and solve for value of the game and player strategies.

Update (December 3): Added tip about sequence representation:

     Tip: Actions are ordered in the sense that that for two histories
     h,g in the same infoset it holds that h.actions() == g.actions().
     You can use action's index in the list for it's identification.
     Do not use str(action) or some other way of encoding the action based
     on your previous implementation.
     
     To identify **sequences** it is not enough to use the list of action indices!
     Consider simple poker: first player can fold or bet.
     Folding in one Infoset_1  (player has card J) is however not the same thing 
     as folding in Infoset_2 (player has card Q)!
     
     In class, we identified folding as f1 and f2, based on the fact from which
     infoset they came from.

Also, do not rely on Action being able to be used as Dict key as it does not have `hash()` function implemented.

Update (December 10): Previously in the template it was asked that you “do not use the zero sum property” to find the expected value in the root for given player. What we meant by that is that you are not allowed to make only one LP for both players, and find the value as `u_2(root) = -u_1(root)`. For each of the players you should have different sets of constraints, since they have different opponent infosets and player sequences. Within the constraints, you can of course use the fact that `u_2(z) = -u_1(z)` (where `z` is leaf in the tree), and you can specify the LP as maximization for both of the players.

Deadline

Submit your solutions to upload system.

Game tree: 27.11.2019 4:00 CET [8 pts]
Sequence-form LP: ~~11.12.2019 4:00 CET~~ postponed to 18.12.2019 4:00 CET [9 pts]

Input

For the first task:

The assignment is to implement the program for solving described games. The program reads input in following format:

number: number of rows of the maze (M) number: number of columns of the maze (N) M rows, N symbols representing squares of the maze number: the overall number of the bandits float number: the probability of a successful attack

Squares of the maze are defined as follows:

# - obstacle - - empty square S - start of your agent D - destination of your agent G - gold treasure E - dangerous place

Constants for Utility Calculation:
Your agent receives 10 points for reaching the destination and he receives 1 additional point for each collected gold treasure. If a bandit successfully attacks the agent, your agent receives 0 points no matter how many gold treasures he has been carrying before the attack. If your agent follows a path that does not end in the destination square (e.g., if it leads to a dead end), your agent receives 0 points.

Input Example:

7
9
#########
#E----EG#
#-##-##E#
#S##-##-#
#-##-##-#
#E-----D#
#########
2
0.5

For the second task:

The input starts the same as in the first task, and there is one additional row that specifies index of the player (0 or 1).

Output

For the first task:

The output must be a valid gambit file, as defined by the `export_gambit` function in the python template.

For the second task:

The output of your program is the expected utility in the root node for give player.

Conventions & FAQ

Your agent moves in the maze in 4 directions (up, down, left, right).
As soon as the agent reaches the destination square, the game ends.
Use the name game_tree.py for the first task, and game_lp.py for the second. You should import the tree file in the LP solving file.
Use Python >= 3.6

Examples and Results

The value of the game must be exactly the same (modulo double arithmetic imprecision). If the difference is larger (e.g., >0.001), your solution is still incorrect!

7
9
#########
#G-----E#
#-#####-#
#S--E--D#
#-#####-#
#G-----E#
#########
2
0.5

the value of the game: 7.096774193548387

7
9
#########
#G-E---E#
#-#####-#
#S-E--ED#
#-#####-#
#G-E---E#
#########
2
0.6

the value of the game: 3.4064516129032265

7
9
#########
#E----EG#
#-##-##E#
#S##-##-#
#-##-##-#
#E-----D#
#########
2
0.5

the value of the game: 5.5

5
7
#######
#E---E#
#S#-#D#
#E--EG#
#######
1
0.7

the value of the game: 5.054761904761906

Gambit

The gambit project is located at http://www.gambit-project.org/

Installation of the version published at the website did not work for me, but building manually from sources did (Ubuntu 19.04):

  git clone git://github.com/gambitproject/gambit.git
  cd gambit

After this, you will need to set up the build scripts by executing ::

  aclocal
  libtoolize
  automake --add-missing
  autoconf

You will need to install gui support libraries: libwxgtk >= 2.8.0

  sudo apt-get install libwxgtk3.0-gtk3-dev
  sudo apt-get install libwxgtk3.0-gtk3-0v5

Then

  ./configure
  make -j 8 # to compile faster using 8 cores
  sudo make install

You can launch gambit!

  gambit

You can generate a tree, and visualize it in gambit:

  python game_tree.py < maze1.txt > maze1.efg
  gambit maze1.efg

Table of Contents