Warning
This page is located in archive.

Prisoner’s Dilemma

This topic is explained in 1st lecture, 1st week. Prisoner’s dilemma and its variants can be found in large amount of literature and everyone can find a variant, which suits him. Good starting point for study is definitely the wikipedia page.

The Problem

We shall explain the principle of non-cooperative two player game on a specific example, which we will borrow from chapter 3 of the book [Shoham-Brown2009]. Internet traffic is controlled by the TCP (Transmission Control Protocol). Imagine, there are only two players in the network. You (player A) and someone else (player B). One of the basic mechanism used, is the backoff mechanism. If you send too many packets and oversaturate the network, you both backoff and wait for a random amount of time before repeating your broadcast. At least, that is how the proper (Correct) implementation works. The incorrect one (Defective) means you do not back off and hope the other does. Thus, you can pick one of either two strategies, either to cooperate (Correct) or not to cooperate (Defective). If both you and your colleague pick C, then the average packet delay is 1ms. If you both pick D, then the average packet delay is 3ms, due to higher load on the network. Finally, in case of asymmetrical choice, packets of player picking D travel without any delay, while the packets of player picking C travel with 4ms delay.

C D
C -1, -1 -4, 0
D 0, -4 -3, -3

This is the so called payoff matrix. Which strategy do you pick? Your target of course is to have your packets delivered with minimal possible delay. The problem is, that the protocol does not allow for communication beforehand. Packets are sent repeatedly and thus the proper strategy has to be chosen repeatedly as well. The only information you have for an iterated game is its history, so you know how you played and how did you colleague play.

The Assignment

Create a player (agent), which is able to play iterated non-cooperative two player game. The input will be payoff matrix, the number of iterations may or many not be known in advance. The objective is to either maximize your gain or to minimize your losses.

You can find more detailed specification here.

Turn in

This assignment has two terms for turn in, you can see those in the upload system.

  1. Simple player, which fulfills the formal requirements, that is it is able to play (assignment 03_PD_player)
  2. Smart player, which is going to play against all others. Its main task is to maximize its gain over the course of the tournament. (assignment 04_PD_tournament)

Evaluation and terms. (How many points can you get and for what)

Results

The results can be found here

courses/ae4b99rph/labs/prisoners_dilemma/start.txt · Last modified: 2013/10/04 13:02 (external edit)