Sequential decisions under uncertainty

Task: Implement the value iteration and policy iteration methods.

The package kuimaze package (zip-archive) was updated. I will be using it during the lectures for demoing the algorithms.

The kuimaze.Maze contains new visualization option for displaying node utilities, policy visualization is yet to be added and has new public methods:

The constructor has also been updated. It allows a more flexible grid_world specification. See the provided mdp_sandbox.py.

The policy and utils are expected to be dictionaries, indexed by a tuple state.x, state.y. As an example see the init functions below (also in the mdp_sandbox.py):

def init_utils(problem):
    '''
    Initialize all state utilities to zero except the goal states
    :param problem: problem - object, for us it will be kuimaze.Maze object
    :return: dictionary of utilities, indexed by state coordinates
    '''
    utils = dict()
    for state in problem.get_all_states():
        if problem.is_goal_state(state):
            utils[(state.x, state.y)] = state.reward
        else:
            utils[(state.x, state.y)] = 0
    return utils
 
def init_policy(problem):
    policy = dict()
    for state in problem.get_all_states():
        if problem.is_goal_state(state):
            policy[state.x, state.y] = None
            continue
        actions = [action for action in problem.get_actions(state)]
        policy[state.x, state.y] = random.choice(actions)
    return policy