07 Sequential II

Sequential decisions and how do we calculate the proper policy?

Learning outcomes

After this practice session, the student

  • computes the state values in simple grid worlds;
  • derives the policy from known state values;
  • understands the difference between state values (v) and state-action values (q).

Program

  • Discussion of the bonus quiz from the last week (prob. of reaching goal state).
  • Finding the optimal policy using value iteration and policy iteration.
  • Introduction of the bonus quiz for the next week (policy evaluation).

Exercise / Solving together

[See pdf]

Bonus quiz

Policy evaluation on a small map.

  • 0.5 points
  • submit your solution to BRUTE lab07quiz, deadline in BRUTE
  • format: text file, photo of your solution on paper, pdf - what is convenient for you
  • solution will be discussed on the next lab
  • Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Homework

courses/be5b33kui/labs/weekly/week_07.txt · Last modified: 2026/03/30 16:02 by xposik