This task is to exercise Markov decision processes in a world represented by a rectangular grid. The goal of the agent is to take actions for the gridworld pass that maximizes the reward reached through the episode.
Install MDP Toolbox for Matlab first. Get familiar with the demo function demo_russell.m. The function implements an example of grid world, read details ad demo_russell. Independently and individually solve the following subtasks:
Tuesday 23.5.2017 23:59 CEST
MDP Toolbox inconsistency that can complicate your solution
The function mk_grid_world contains an incorrect comment: “ … actions are numbered as North(1), South (2), East(3), West (4)”. In fact, the following match holds: N = 1; E = 2; S = 3; W = 4 (i.e., North(1), East (2), South (3), West (4)).
MDP Toolbox numbers all the grid worlds in such a way that the field (1,1), resp. 1 when dealing with 1 dimensional index, is understood as the upper-left field. This is consistent with the assignment ad Sutton and inconsistent with the original assignment ad Russell.
Solutions without Matlab
Matlab is not compulsory. As a piece of inspiration, see Mr Horak's earlier solution JavaMDP. This solution can be used as a starting point of your own Java solution too. The minimum necessary change is transition from the current value iteration towards policy iteration. GUI can be used with no change or with minimum necessary change, it is not a part of evaluation.
Evaluation
5 points assignment, minimum of 2.5 points needs to be reached.