====== 07 Sequential II ====== Sequential decisions and how do we calculate the proper policy? ===== Learning outcomes ===== After this practice session, the student * computes the state values in simple grid worlds; * derives the policy from known state values; * understands the difference between state values (v) and state-action values (q). ===== Program ===== * Discussion of the bonus quiz from the last week (prob. of reaching goal state). * Finding the optimal policy using value iteration and policy iteration. * Introduction of the bonus quiz for the next week (policy evaluation). ===== Exercise / Solving together ===== [See {{ :courses:be5b33kui:labs:weekly:Value_Policy_Iteration_example_en.pdf |pdf}}] ===== Bonus quiz ===== Policy evaluation on a small map. * 0.5 points * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab07quiz**, deadline in BRUTE * format: text file, photo of your solution on paper, pdf - what is convenient for you * solution will be discussed on the next lab * Students with their family name starting from A to K (included) have to solve and upload {{ :courses:be5b33kui:labs:weekly:policyevaluation_a_2025.pdf |subject A}} , while students with family name from L to Z have to solve and upload {{ :courses:be5b33kui:labs:weekly:policyevaluation_b_2025.pdf |subject B}}. > {{page>courses:be5b33kui:internal:quizzes##Policy evaluation.}} ===== Homework ===== * Finish and submit the bonus quiz, ''lab07quiz''. * Work on the [[courses:be5b33kui:semtasks:03_mdp:start|Markov decision processes]] semestral task.