Grid-world

Grid-world

This task is to exercise Markov decision processes in a world represented by a rectangular grid. The goal of the agent is to take actions for the gridworld pass that maximizes the reward reached through the episode.

Assignment

Install MDP Toolbox for Matlab first. Get familiar with the demo function demo_russell.m. The function implements an example of grid world, read details ad demo_russell. Independently and individually solve the following subtasks:

implement your own policy iteration function according to the description presented in lecture (its existing MDP toolbox implementation is different), you may use the existing auxiliary MDP toolbox functions,
check its functionality by comparison with the existing value iteration, work with the world ad Russell,
using your function, find all the threshold values for action application reward (currently -0.04) such that the optimal policy changes when the threshold is crossed, comment and report the thresholds and policies, how many thresholds did you find?
write a brief (2 page) but self-explanatory report describing the three previous tasks, submit a zip file including your implementation (the function, the experimental script) and the report into system.

Deadline

Tuesday 23.5.2017 23:59 CEST

Notes

MDP Toolbox inconsistency that can complicate your solution

The function mk_grid_world contains an incorrect comment: “ … actions are numbered as North(1), South (2), East(3), West (4)”. In fact, the following match holds: N = 1; E = 2; S = 3; W = 4 (i.e., North(1), East (2), South (3), West (4)).

MDP Toolbox numbers all the grid worlds in such a way that the field (1,1), resp. 1 when dealing with 1 dimensional index, is understood as the upper-left field. This is consistent with the assignment ad Sutton and inconsistent with the original assignment ad Russell.

Solutions without Matlab

Matlab is not compulsory. As a piece of inspiration, see Mr Horak's earlier solution JavaMDP. This solution can be used as a starting point of your own Java solution too. The minimum necessary change is transition from the current value iteration towards policy iteration. GUI can be used with no change or with minimum necessary change, it is not a part of evaluation.

Evaluation

5 points assignment, minimum of 2.5 points needs to be reached.

Table of Contents

Grid-world

Assignment

Deadline

Notes