Table of Contents

07 Sequential II

07 Sequential II

Sequential decisions and how do we calculate the proper policy?

Learning outcomes

After this practice session, the student

computes the state values in simple grid worlds;
derives the policy from known state values;
understands the difference between state values (v) and state-action values (q).

Program

Discussion of the bonus quiz from the last week (prob. of reaching goal state).
Finding the optimal policy using value iteration and policy iteration.
Introduction of the bonus quiz for the next week (policy evaluation).

Exercise / Solving together

[See pdf]

Bonus quiz

Policy evaluation on a small map.

0.5 points
submit your solution to BRUTE lab07quiz, deadline in BRUTE
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Homework

Finish and submit the bonus quiz, lab07quiz.
Work on the Markov decision processes semestral task.

courses/be5b33kui/labs/weekly/week_07.txt · Last modified: 2026/03/30 16:02 by xposik