====== 07 Sequential II ======
Sequential decisions and how do we calculate the proper policy?

===== Learning outcomes =====
After this practice session, the student
  * computes the state values in simple grid worlds;
  * derives the policy from known state values;
  * understands the difference between state values (v) and state-action values (q).

===== Program =====
  * Discussion of the bonus quiz from the last week (prob. of reaching goal state).
  * Finding the optimal policy using value iteration and policy iteration.
  * Introduction of the bonus quiz for the next week (policy evaluation).


===== Exercise / Solving together =====

[See {{ :courses:be5b33kui:labs:weekly:Value_Policy_Iteration_example_en.pdf |pdf}}]

===== Bonus quiz =====

Policy evaluation on a small map.
  * 0.5 points
  * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab07quiz**, deadline in BRUTE
  * format: text file, photo of your solution on paper, pdf - what is convenient for you
  * solution will be discussed on the next lab
  * Students with their family name starting from A to K (included) have to solve and upload {{ :courses:be5b33kui:labs:weekly:policyevaluation_a_2025.pdf |subject A}} , while students with family name from L to Z have to solve and upload {{ :courses:be5b33kui:labs:weekly:policyevaluation_b_2025.pdf |subject B}}.

> {{page>courses:be5b33kui:internal:quizzes##Policy evaluation.}}


===== Homework =====
  * Finish and submit the bonus quiz, ''lab07quiz''.
  * Work on the [[courses:be5b33kui:semtasks:03_mdp:start|Markov decision processes]] semestral task.