====== 08 Reinforcement Learning I ====== We don't know the model of the robot-agent; it's behaving somewhat strangely, the path to the goal is unknown, with some traps along the way: what do we do? ===== Learning outcomes ===== After this practice session, the student * understands what information are available to an agent in a RL task; * estimates the environment characteristics (MDP parameters) from the available data (episodes). ===== Program ===== * Discussion of the bonus quiz from last week (policy evaluation) * Estimation of environment features (parameters of MDP) from episodes * Introduction of the bonus quiz for this week ===== Exercise / Solving together ===== * Policy estimation from training episodes {{ :courses:be5b33kui:labs:weekly:policy_estimation_example.pdf |pdf}} using model-based learning. > {{page>courses:be5b33kui:internal:quizzes#policy_estimation_from_training_episodes}} ===== Bonus quiz ===== * Direct Q value evaluation * 0.5 points * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab08quiz** by April 16, midnight * format: text file, photo of your solution on paper, pdf - what is convenient for you * solution will be discussed on the next lab * Students with their family name starting from A to K (included) have to solve and upload {{ :courses:be5b33kui:labs:weekly:directqevaluation_a_2025.pdf|subject A}} , while students with family name from L to Z have to solve and upload {{ :courses:be5b33kui:labs:weekly:directqevaluation_b_2025.pdf|subject B}}. ===== Homework ===== * Submit your solution of bonus quiz to BRUTE, task ''lab08quiz''. * Watch [[https://www.youtube.com/watch?v=r0SytN0sAhI|Mystery game video 1]] and [[https://www.youtube.com/watch?v=uH-DNeTAYMM|Mystery game video 2]]. Think about how much more difficult the reasoning in the tasks above would be for us, if the states and actions were randomly ordered and named/denoted by unrelated names/symbols. * If done with MDP semestral task, you can start working on the [[courses:be5b33kui:semtasks:04_rl:start|Reinforcement Learning]] assignment, deadline on [[https://cw.felk.cvut.cz/upload/|BRUTE]].