Warning
This page is located in archive. Go to the latest version of this course pages. Go the latest version of this page.

09 Reinforcement Learning II

How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka explotation vs. exploration.

Exercise for bonus points

  • Calculate Q values from training episodes using temporal difference method
  • 0.5 points
  • submit your solution to BRUTE lab09quiz by April 24, midnight
  • format: text file, photo of your solution on paper, pdf - what is convenient for you
  • solution will be discussed on the next lab
  • Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Exercise II / Solving together during interactive lab

Effect of discount factor on policy. See pdf

Individual Work

Reinforcement learning plus

Reinforcement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:

courses/be5b33kui/labs/weekly/week_09.txt · Last modified: 2023/04/24 14:49 by gamafili