====== Sequential decisions under uncertainty ====== **Task:** Implement the value iteration and policy iteration methods. The package [[https://cw.felk.cvut.cz/courses/be5b33kui/kuimaze.zip|kuimaze package (zip-archive)]] was updated. I will be using it during the lectures for demoing the algorithms. The ''kuimaze.Maze'' contains new visualization option for displaying node utilities, policy visualization is yet to be added and has new public methods: * [[https://cw.felk.cvut.cz/courses/be5b33kui/kuimaze_doc/kuimaze.maze.Maze-class.html#get_all_states|get_all_states()]] * [[https://cw.felk.cvut.cz/courses/be5b33kui/kuimaze_doc/kuimaze.maze.Maze-class.html#get_next_states_and_probs|get_next_states_and_probs(state, action)]] * [[https://cw.felk.cvut.cz/courses/be5b33kui/kuimaze_doc/kuimaze.maze.Maze-class.html#set_node_utils|set_node_utils(utils)]] The constructor has also been updated. It allows a more flexible grid_world specification. See the provided ''mdp_sandbox.py''. The ''policy'' and ''utils'' are expected to be dictionaries, indexed by a tuple ''state.x, state.y''. As an example see the init functions below (also in the ''mdp_sandbox.py''): def init_utils(problem): ''' Initialize all state utilities to zero except the goal states :param problem: problem - object, for us it will be kuimaze.Maze object :return: dictionary of utilities, indexed by state coordinates ''' utils = dict() for state in problem.get_all_states(): if problem.is_goal_state(state): utils[(state.x, state.y)] = state.reward else: utils[(state.x, state.y)] = 0 return utils def init_policy(problem): policy = dict() for state in problem.get_all_states(): if problem.is_goal_state(state): policy[state.x, state.y] = None continue actions = [action for action in problem.get_actions(state)] policy[state.x, state.y] = random.choice(actions) return policy {{:courses:be5b33kui:labs:sequential_decisions:node_utils.png|}}