<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.8" -->
<?xml-stylesheet href="https://cw.fel.cvut.cz/b252/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="https://cw.fel.cvut.cz/b252/feed.php">
        <title>CourseWare Wiki courses:smu:tutorials</title>
        <description></description>
        <link>https://cw.fel.cvut.cz/b252/</link>
        <image rdf:resource="https://cw.fel.cvut.cz/b252/lib/tpl/bulma-cw/images/favicon.ico" />
       <dc:date>2026-04-19T10:22:54+0200</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="https://cw.fel.cvut.cz/b252/courses/smu/tutorials/tutorial9?rev=1770627247&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="https://cw.fel.cvut.cz/b252/lib/tpl/bulma-cw/images/favicon.ico">
        <title>CourseWare Wiki</title>
        <link>https://cw.fel.cvut.cz/b252/</link>
        <url>https://cw.fel.cvut.cz/b252/lib/tpl/bulma-cw/images/favicon.ico</url>
    </image>
    <item rdf:about="https://cw.fel.cvut.cz/b252/courses/smu/tutorials/tutorial9?rev=1770627247&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2026-02-09T09:54:07+0200</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>courses:smu:tutorials:tutorial9</title>
        <link>https://cw.fel.cvut.cz/b252/courses/smu/tutorials/tutorial9?rev=1770627247&amp;do=diff</link>
        <description>Tutorial 3 - reinforcement learning III.

Problem 1 - Passive reinforcement learning

Consider the following MDP. Assume that reward is in the form $r(s,a)$, i.e., $r: S \times A \mapsto \mathbb{R}$. Set $\gamma = \frac{1}{2}$.

Suppose that you have seen the following sequence of states, actions, and rewards:
$$
  s_1, \mathrm{switch},
  s_2, \mathrm{stay}, +1,
  s_2, \mathrm{stay}, +1,
  s_2, \mathrm{switch},
  s_1, \mathrm{stay},
  s_1, \mathrm{switch},
  s_1, \mathrm{switch},
  s_1, \mathrm{…</description>
    </item>
</rdf:RDF>
