Sequential decisions and how do we calculate the proper policy?
Policy evaluation on a small map.
MDP
Let's have the following game.
We roll the dice and pay 1kc for each roll. If we roll six two times in a row, we win 1000CZK and the game is over.
The game can be terminated at any time without payment.
1) Formulate as MDP task (states, actions, T (s, a, s '), r (s, a, s')).
2) Determine the optimal policy.
Work with the Markov decision process task. .