Table of Contents

08 Reinforcement Learning I

08 Reinforcement Learning I

We don't know the model of the robot-agent; it's behaving somewhat strangely, the path to the goal is unknown, with some traps along the way: what do we do?

Learning outcomes

After this practice session, the student

understands what information are available to an agent in a RL task;
estimates the environment characteristics (MDP parameters) from the available data (episodes).

Program

Discussion of the bonus quiz from last week (policy evaluation)
Estimation of environment features (parameters of MDP) from episodes
Introduction of the bonus quiz for this week

Exercise / Solving together

Policy estimation from training episodes pdf using model-based learning.

Bonus quiz

Direct Q value evaluation
0.5 points
submit your solution to BRUTE lab08quiz by April 16, midnight
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Homework

Submit your solution of bonus quiz to BRUTE, task lab08quiz.
Watch Mystery game video 1 and Mystery game video 2. Think about how much more difficult the reasoning in the tasks above would be for us, if the states and actions were randomly ordered and named/denoted by unrelated names/symbols.
If done with MDP semestral task, you can start working on the Reinforcement Learning assignment, deadline on BRUTE.