Table of Contents

06 Sequential I

06 Sequential I

What if we need to decide multiple times with uncertainty and with decisions influencing our future decisions?

Learning outcomes

After this practice session, the student

can define a Markov decision process and understands the terms policy, episode, and return;
can estimate a state value from several episodes.

Program

Discussion of the bonus quiz from the last week (siblings, conditional probabilities)
Basics of MDPs
Evaluating a state using several episodes
MDP assignment introduction

Intro to MDP

What do you need to know to have a fully specified Markov decision process (MDP)?
What is a policy? What is an episode?
How to compute the return of an episode?
How to estimate the value of a state from several episodes?

Exercise / Solving together

+ other exercises [See pdf]

Bonus quiz

Navigating through a gridworld and calculating the proper path..

0.5 points
submit your solution to BRUTE lab06quiz, deadline in BRUTE.
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Solve and submit the right version according to the first character of your family name:
- family name starting from A to K: version A
- family name starting from L to Z: verison B.

Homework

Submit your solution of bonus quiz to BRUTE, task lab06quiz.
Markov decision process.