====== 09 Reinforcement Learning II ====== How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka explotation vs. exploration. ===== Exercise for bonus points ===== * Calculate Q values from training episodes using temporal difference method * 0.5 points * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab09quiz** by April 23, midnight * format: text file, photo of your solution on paper, pdf - what is convenient for you * solution will be discussed on the next lab * Students with their family name starting from A to K (included) have to solve and upload {{ :courses:be5b33kui:labs:weekly:Qlearning_a_2024.pdf |subject A}} , while students with family name from L to Z have to solve and upload {{ :courses:be5b33kui:labs:weekly:Qlearning_b_2024.pdf |subject B}}. ===== Exercise II / Solving together during interactive lab ===== Effect of discount factor on policy. {{:courses:be5b33kui:labs:weekly:Discount_factor_example.pdf |See pdf}} > {{page>courses:be5b33kui:internal:quizzes#effect_of_discount_factor_on_estimating_the_policy}} ===== Individual Work ===== Work on the [[courses:be5b33kui:semtasks:04_rl:start|Reinforcement learning assignment]]. ===== Reinforcement learning plus ===== Reinforcement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration: * [[https://www.youtube.com/watch?v=SH3bADiB7uQ|Table tennis robot player]]. Starting from imitation, then generalizing through RL. * [[https://research.google.com/teams/brain/robotics/|Robotics@google]]. Well, they can afford many learning episodes many iterations ;-) * [[https://medium.com/@dhruvp/how-to-write-a-neural-network-to-play-pong-from-scratch-956b57d4f6e0|Pong game]]. Learning to play the very old computer game with the help of AI-Gym. [[https://www.youtube.com/watch?time_continue=6&v=YOW8m2YGtRg|YT Video]]