06 Sequential I
What if we need to decide multiple times with uncertainty and with decisions influencing our future decisions?
Learning outcomes
After this practice session, the student
can define a Markov decision process and understands the terms policy, episode, and return;
can estimate a state value from several episodes.
Program
Discussion of the bonus quiz from the last week (siblings, conditional probabilities)
Basics of MDPs
Evaluating a state using several episodes
MDP assignment introduction
Intro to MDP
What do you need to know to have a fully specified Markov decision process (MDP)?
What is a policy? What is an episode?
How to compute the return of an episode?
How to estimate the value of a state from several episodes?
Exercise / Solving together
+ other exercises [See pdf]
Bonus quiz
Navigating through a gridworld and calculating the proper path..
0.5 points
submit your solution to
BRUTE lab06quiz, deadline in BRUTE.
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Solve and submit the right version according to the first character of your family name:
Homework