09 Reinforcement Learning II
How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka explotation vs. exploration.
Quiz for bonus points
Calculate Q values from training episodes using temporale difference method
0.5 points
submit your solution to BRUTE lab09quiz by April 27, midnight
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
quiz assignment: [will be accessible from Monday, April 27]
Quiz II / Solving together during interactive lab
Individual Work
Reinforcement learning plus
Reinforecement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:
Robotics@google. Well, they can afford many learning episodes many iterations
Pong game. Learning to play the very old computer game with the help of AI-Gym.
YT Video