09 Reinforcement Learning II

How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka explotation vs. exploration.

Quiz for bonus points

  • Calculate Q values from training episodes using temporale difference method
  • 0.5 points
  • submit your solution to BRUTE lab09quiz by April 19, midnight
  • format: text file, photo of your solution on paper, pdf - what is convenient for you
  • solution will be discussed on the next lab
  • quiz assignment: [Students with their family name starting from A to L (included) have to solve and upload subject A , while students with family name from M to Z have to solve and upload subject B]

Quiz II / Solving together during interactive lab

Effect of discount factor on policy. See pdf

Individual Work

Reinforcement learning plus

Reinforecement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:

