====== 06 Sequential I ======
What if we need to decide multiple times with uncertainty and with decisions influencing our future decisions?

===== Learning outcomes =====
After this practice session, the student
  * can define a Markov decision process and understands the terms //policy//, //episode//, and //return//;
  * can estimate a state value from several episodes.
===== Program =====
  * Discussion of the bonus quiz from the last week (siblings, conditional probabilities)
  * Basics of MDPs
  * Evaluating a state using several episodes
  * MDP assignment introduction


===== Intro to MDP =====
  * What do you need to know to have a fully specified Markov decision process (MDP)?
  * What is a policy? What is an episode?
  * How to compute the //return// of an episode? 
  * How to estimate the value of a state from several episodes?

===== Exercise / Solving together =====

> {{page>courses:be5b33kui:internal:quizzes##State value evaluation.}}

+ other exercises [See {{ :courses:be5b33kui:labs:weekly:MDP_example.pdf |pdf}}]


===== Bonus quiz =====
Navigating through a gridworld and calculating the proper path..
  * 0.5 points
  * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab06quiz**, deadline in BRUTE.
  * format: text file, photo of your solution on paper, pdf - what is convenient for you
  * solution will be discussed on the next lab
  * Solve and submit the right version according to the first character of your family name:
    * family name starting from A to K: {{ :courses:be5b33kui:labs:weekly:GridWorld_a_2025.pdf |version A}}
    * family name starting from L to Z: {{ :courses:be5b33kui:labs:weekly:GridWorld_b_2025.pdf |verison B}}.

> {{page>courses:be5b33kui:internal:quizzes##grid_world}}


===== Homework =====
  * Submit your solution of bonus quiz to BRUTE, task ''lab06quiz''.
  * [[courses:be5b33kui:semtasks:03_mdp:start|Markov decision process]].