06 Sequential I

What if we need to decide multiple times with uncertainty and with decisions influencing our future decisions?

Learning outcomes

After this practice session, the student

  • can define a Markov decision process and understands the terms policy, episode, and return;
  • can estimate a state value from several episodes.

Program

  • Discussion of the bonus quiz from the last week (siblings, conditional probabilities)
  • Basics of MDPs
  • Evaluating a state using several episodes
  • MDP assignment introduction

Intro to MDP

  • What do you need to know to have a fully specified Markov decision process (MDP)?
  • What is a policy? What is an episode?
  • How to compute the return of an episode?
  • How to estimate the value of a state from several episodes?

Exercise / Solving together

+ other exercises [See pdf]

Bonus quiz

Navigating through a gridworld and calculating the proper path..

  • 0.5 points
  • submit your solution to BRUTE lab06quiz, deadline in BRUTE.
  • format: text file, photo of your solution on paper, pdf - what is convenient for you
  • solution will be discussed on the next lab
  • Solve and submit the right version according to the first character of your family name:
    • family name starting from A to K: version A
    • family name starting from L to Z: verison B.

Homework

courses/be5b33kui/labs/weekly/week_06.txt · Last modified: 2026/03/30 14:10 by xposik