Search
Task: Implement the value iteration and policy iteration methods.
The package kuimaze package (zip-archive) was updated. I will be using it during the lectures for demoing the algorithms.
The kuimaze.Maze contains new visualization option for displaying node utilities, policy visualization is yet to be added and has new public methods:
kuimaze.Maze
The constructor has also been updated. It allows a more flexible grid_world specification. See the provided mdp_sandbox.py.
mdp_sandbox.py
The policy and utils are expected to be dictionaries, indexed by a tuple state.x, state.y. As an example see the init functions below (also in the mdp_sandbox.py):
policy
utils
state.x, state.y
def init_utils(problem): ''' Initialize all state utilities to zero except the goal states :param problem: problem - object, for us it will be kuimaze.Maze object :return: dictionary of utilities, indexed by state coordinates ''' utils = dict() for state in problem.get_all_states(): if problem.is_goal_state(state): utils[(state.x, state.y)] = state.reward else: utils[(state.x, state.y)] = 0 return utils def init_policy(problem): policy = dict() for state in problem.get_all_states(): if problem.is_goal_state(state): policy[state.x, state.y] = None continue actions = [action for action in problem.get_actions(state)] policy[state.x, state.y] = random.choice(actions) return policy