Warning
This page is located in archive. Go to the latest version of this course pages. Go the latest version of this page.

07 Sequential II

Sequential decisions and how do we calculate the proper policy?

Quiz

Policy evaluation on a small map.

MDP

Let's have the following game. We roll the dice and pay 1kc for each roll. If we roll six two times in a row, we win 1000CZK and the game is over.
The game can be terminated at any time without payment.
1) Formulate as MDP task (states, actions, T (s, a, s '), r (s, a, s')).
2) Determine the optimal policy.

Individual task

Work with the Markov decision process task. .

courses/be5b33kui/labs/weekly/week_07.txt · Last modified: 2020/04/16 13:22 by hoffmmat