We don't know the model of the robot-agent; it's behaving somewhat strangely, the path to the goal is unknown, with some traps along the way: what do we do?
Learning outcomes
After this practice session, the student
understands what information are available to an agent in a RL task;
estimates the environment characteristics (MDP parameters) from the available data (episodes).
Program
Discussion of the bonus quiz from last week (policy evaluation)
Estimation of environment features (parameters of MDP) from episodes
Introduction of the bonus quiz for this week
Exercise / Solving together
Policy estimation from training episodes pdf using model-based learning.
Bonus quiz
Direct Q value evaluation
0.5 points
submit your solution to BRUTElab08quiz by April 16, midnight
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.
Homework
Submit your solution of bonus quiz to BRUTE, task lab08quiz.
Watch Mystery game video 1 and Mystery game video 2. Think about how much more difficult the reasoning in the tasks above would be for us, if the states and actions were randomly ordered and named/denoted by unrelated names/symbols.
If done with MDP semestral task, you can start working on the Reinforcement Learning assignment, deadline on BRUTE.