09 Reinforcement Learning II

How not to repeat yourself. We've already found a way, but maybe there's a better place somewhere. Aka exploration vs. exploitation.

Learning outcomes

After this practice session, the student

  • understands how a discount factor (or other MDP parameters) can affect the resulting policy.

Program

  • Q/A session
  • Discussion of the bonus quiz from the last week
  • Exercise: Influence of discount factor to the resulting policy
  • Introduction to RL assignment
  • Introduction of the bonus quiz for this week

Exercise / Solving together

Effect of discount factor on policy. See pdf

Bonus quiz

  • Calculate Q values from training episodes using temporal difference method
  • 0.5 points
  • submit your solution to BRUTE lab09quiz by April 15, midnight
  • format: text file, photo of your solution on paper, pdf - what is convenient for you
  • solution will be discussed on the next lab
  • Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Homework

Reinforcement learning plus

Reinforcement learning is now a very active area, also supported by rapid progress in deep neural network learning. A few links for further inspiration:

courses/be5b33kui/labs/weekly/week_09.txt · Last modified: 2026/04/20 16:31 by xposik