09 Reinforcement Learning II
How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka exploration vs. exploitation.
Learning outcomes
After this practice session, the student
Program
Q/A session
Discussion of the bonus quiz from the last week
Exercise: Influence of discount factor to the resulting policy
Introduction to RL assignment
Introduction of the bonus quiz for this week
Exercise / Solving together
Effect of discount factor on policy. See pdf
Bonus quiz
Calculate Q values from training episodes using temporal difference method
0.5 points
submit your solution to
BRUTE lab09quiz, deadline in BRUTE
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Students with their family name starting from A to K (included) have to solve and upload
subject A , while students with family name from L to Z have to solve and upload
subject B.
Homework
Reinforcement learning plus
Reinforcement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:
Robotics@google. Well, they can afford many learning episodes many iterations
Pong game. Learning to play the very old computer game with the help of AI-Gym.
YT Video