Search
Sequential decisions and how do we calculate the proper policy?
Policy evaluation on a small map.
MDP
Let's have the following game. We roll the dice and pay 1kc for each roll. If we roll six two times in a row, we win 1000CZK and the game is over. The game can be terminated at any time without payment. 1) Formulate as MDP task (states, actions, T (s, a, s '), r (s, a, s')). 2) Determine the optimal policy.
Work with the Markov decision process task. .