09 Reinforcement Learning II

How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka explotation vs. exploration.

Quiz for bonus points

Calculate Q values from training episodes using temporale difference method
0.5 points
submit your solution to BRUTE lab09quiz by April 27, midnight
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
quiz assignment: [will be accessible from Monday, April 27]

Quiz II / Solving together during interactive lab

Effect of discount factor on policy. Lab exercise on discount factors (pdf)

Individual Work

Work on the Reinforcement learning assignment.

Reinforcement learning plus

Reinforecement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:

Table tennis robot player. Starting from imitation, then generalizing through RL.
Robotics@google. Well, they can afford many learning episodes many iterations
Pong game. Learning to play the very old computer game with the help of AI-Gym. YT Video

Table of Contents

09 Reinforcement Learning II

Quiz for bonus points

Quiz II / Solving together during interactive lab

Individual Work

Reinforcement learning plus