08 Reinforcement Learning I

We don't know the model of the robot-agent; it's behaving somewhat strangely, the path to the goal is unknown, with some traps along the way: what do we do?

Learning outcomes

After this practice session, the student

  • understands what information are available to an agent in a RL task;
  • estimates the environment characteristics (MDP parameters) from the available data (episodes).

Program

  • Discussion of the bonus quiz from last week (policy evaluation)
  • Estimation of environment features (parameters of MDP) from episodes
  • Introduction of the bonus quiz for this week

Exercise / Solving together

  • Policy estimation from training episodes pdf using model-based learning.

Bonus quiz

  • Direct Q value evaluation
  • 0.5 points
  • submit your solution to BRUTE lab08quiz by April 16, midnight
  • format: text file, photo of your solution on paper, pdf - what is convenient for you
  • solution will be discussed on the next lab
  • Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Homework

  • Submit your solution of bonus quiz to BRUTE, task lab08quiz.
  • Watch Mystery game video 1 and Mystery game video 2. Think about how much more difficult the reasoning in the tasks above would be for us, if the states and actions were randomly ordered and named/denoted by unrelated names/symbols.
  • If done with MDP semestral task, you can start working on the Reinforcement Learning assignment, deadline on BRUTE.
courses/be5b33kui/labs/weekly/week_08.txt · Last modified: 2026/04/13 15:37 by xposik