====== 06 Sequential I ====== What if we need to decide multiple times with uncertainty and with decisions influencing our future decisions? ===== Learning outcomes ===== After this practice session, the student * can define a Markov decision process and understands the terms //policy//, //episode//, and //return//; * can estimate a state value from several episodes. ===== Program ===== * Discussion of the bonus quiz from the last week (siblings, conditional probabilities) * Basics of MDPs * Evaluating a state using several episodes * MDP assignment introduction ===== Intro to MDP ===== * What do you need to know to have a fully specified Markov decision process (MDP)? * What is a policy? What is an episode? * How to compute the //return// of an episode? * How to estimate the value of a state from several episodes? ===== Exercise / Solving together ===== > {{page>courses:be5b33kui:internal:quizzes##State value evaluation.}} + other exercises [See {{ :courses:be5b33kui:labs:weekly:MDP_example.pdf |pdf}}] ===== Bonus quiz ===== Navigating through a gridworld and calculating the proper path.. * 0.5 points * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab06quiz**, deadline in BRUTE. * format: text file, photo of your solution on paper, pdf - what is convenient for you * solution will be discussed on the next lab * Solve and submit the right version according to the first character of your family name: * family name starting from A to K: {{ :courses:be5b33kui:labs:weekly:GridWorld_a_2025.pdf |version A}} * family name starting from L to Z: {{ :courses:be5b33kui:labs:weekly:GridWorld_b_2025.pdf |verison B}}. > {{page>courses:be5b33kui:internal:quizzes##grid_world}} ===== Homework ===== * Submit your solution of bonus quiz to BRUTE, task ''lab06quiz''. * [[courses:be5b33kui:semtasks:03_mdp:start|Markov decision process]].