Sequential decisions under uncertainty

Task: Implement the value iteration and policy iteration methods.

The package kuimaze package (zip-archive) was updated. I will be using it during the lectures for demoing the algorithms.

The kuimaze.Maze contains new visualization option for displaying node utilities, policy visualization is yet to be added and has new public methods:

The constructor has also been updated. It allows a more flexible grid_world specification. See the provided

The policy and utils are expected to be dictionaries, indexed by a tuple state.x, state.y. As an example see the init functions below (also in the

def init_utils(problem):
    Initialize all state utilities to zero except the goal states
    :param problem: problem - object, for us it will be kuimaze.Maze object
    :return: dictionary of utilities, indexed by state coordinates
    utils = dict()
    for state in problem.get_all_states():
        if problem.is_goal_state(state):
            utils[(state.x, state.y)] = state.reward
            utils[(state.x, state.y)] = 0
    return utils
def init_policy(problem):
    policy = dict()
    for state in problem.get_all_states():
        if problem.is_goal_state(state):
            policy[state.x, state.y] = None
        actions = [action for action in problem.get_actions(state)]
        policy[state.x, state.y] = random.choice(actions)
    return policy

courses/be5b33kui/labs/sequential_decisions/start.txt ยท Last modified: 2018/02/06 08:33 (external edit)