====== 09 Reinforcement Learning II ======

How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka explotation vs. exploration.

===== Exercise for bonus points =====

  * Calculate Q values from training episodes using temporal difference method
  * 0.5 points
  * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab09quiz** by April 23, midnight
  * format: text file, photo of your solution on paper, pdf - what is convenient for you
  * solution will be discussed on the next lab
  * Students with their family name starting from A to K (included) have to solve and upload {{ :courses:be5b33kui:labs:weekly:Qlearning_a_2024.pdf |subject A}} , while students with family name from L to Z have to solve and upload {{ :courses:be5b33kui:labs:weekly:Qlearning_b_2024.pdf |subject B}}.

===== Exercise II / Solving together during interactive lab =====

Effect of discount factor on policy. {{:courses:be5b33kui:labs:weekly:Discount_factor_example.pdf |See pdf}}

> {{page>courses:be5b33kui:internal:quizzes#effect_of_discount_factor_on_estimating_the_policy}}

===== Individual Work =====

Work on the [[courses:be5b33kui:semtasks:04_rl:start|Reinforcement learning assignment]].

===== Reinforcement learning plus =====

Reinforcement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:

  * [[https://www.youtube.com/watch?v=SH3bADiB7uQ|Table tennis robot player]]. Starting from imitation, then generalizing through RL.
  * [[https://research.google.com/teams/brain/robotics/|Robotics@google]]. Well, they can afford many learning episodes many iterations ;-)
  * [[https://medium.com/@dhruvp/how-to-write-a-neural-network-to-play-pong-from-scratch-956b57d4f6e0|Pong game]]. Learning to play the very old computer game with the help of AI-Gym. [[https://www.youtube.com/watch?time_continue=6&v=YOW8m2YGtRg|YT Video]]