CourseWare Wiki courses:smu:tutorials

CourseWare Wiki courses:smu:tutorials https://cw.fel.cvut.cz/b252/ 2026-04-19T10:22:54+0200 CourseWare Wiki https://cw.fel.cvut.cz/b252/ https://cw.fel.cvut.cz/b252/lib/tpl/bulma-cw/images/favicon.ico text/html 2026-02-09T09:54:07+0200 Anonymous (anonymous@undisclosed.example.com) courses:smu:tutorials:tutorial9 https://cw.fel.cvut.cz/b252/courses/smu/tutorials/tutorial9?rev=1770627247&do=diff Tutorial 3 - reinforcement learning III. Problem 1 - Passive reinforcement learning Consider the following MDP. Assume that reward is in the form $r(s,a)$, i.e., $r: S \times A \mapsto \mathbb{R}$. Set $\gamma = \frac{1}{2}$. Suppose that you have seen the following sequence of states, actions, and rewards: $$ s_1, \mathrm{switch}, s_2, \mathrm{stay}, +1, s_2, \mathrm{stay}, +1, s_2, \mathrm{switch}, s_1, \mathrm{stay}, s_1, \mathrm{switch}, s_1, \mathrm{switch}, s_1, \mathrm{…