====== 08 Reinforcement Learning I ======

We don't know the model of the robot-agent; it's behaving somewhat strangely, the path to the goal is unknown, with some traps along the way: what do we do?

===== Learning outcomes =====
After this practice session, the student
  * understands what information are available to an agent in a RL task;
  * estimates the environment characteristics (MDP parameters) from the available data (episodes).
===== Program =====
  * Discussion of the bonus quiz from last week (policy evaluation)
  * Estimation of environment features (parameters of MDP) from episodes
  * Introduction of the bonus quiz for this week


===== Exercise / Solving together =====
  * Policy estimation from training episodes {{ :courses:be5b33kui:labs:weekly:policy_estimation_example.pdf |pdf}} using model-based learning.

> {{page>courses:be5b33kui:internal:quizzes#policy_estimation_from_training_episodes}}

===== Bonus quiz =====
  * Direct Q value evaluation
  * 0.5 points
  * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab08quiz** by April 16, midnight
  * format: text file, photo of your solution on paper, pdf - what is convenient for you
  * solution will be discussed on the next lab
  * Students with their family name starting from A to K (included) have to solve and upload {{ :courses:be5b33kui:labs:weekly:directqevaluation_a_2025.pdf|subject A}} , while students with family name from L to Z have to solve and upload {{ :courses:be5b33kui:labs:weekly:directqevaluation_b_2025.pdf|subject B}}.
 

===== Homework =====
  * Submit your solution of bonus quiz to BRUTE, task ''lab08quiz''.
  * Watch [[https://www.youtube.com/watch?v=r0SytN0sAhI|Mystery game video 1]] and [[https://www.youtube.com/watch?v=uH-DNeTAYMM|Mystery game video 2]]. Think about how much more difficult the reasoning in the tasks above would be for us, if the states and actions were randomly ordered and named/denoted by unrelated names/symbols.
  * If done with MDP semestral task, you can start working on the [[courses:be5b33kui:semtasks:04_rl:start|Reinforcement Learning]] assignment, deadline on [[https://cw.felk.cvut.cz/upload/|BRUTE]].