09 Reinforcement Learning II

How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka explotation vs. exploration.

Exercise for bonus points

Calculate Q values from training episodes using temporal difference method
0.5 points
submit your solution to BRUTE lab09quiz by April 23, midnight
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Exercise II / Solving together during interactive lab

Effect of discount factor on policy. See pdf

Individual Work

Work on the Reinforcement learning assignment.

Reinforcement learning plus

Reinforcement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:

Table tennis robot player. Starting from imitation, then generalizing through RL.
Robotics@google. Well, they can afford many learning episodes many iterations
Pong game. Learning to play the very old computer game with the help of AI-Gym. YT Video

Table of Contents

09 Reinforcement Learning II

Exercise for bonus points

Exercise II / Solving together during interactive lab

Individual Work

Reinforcement learning plus