====== 10 Reinforcement Learning III ======
How does Q-learning work for environments with continuous states/actions? Beyond the course contents.

===== Learning outcomes =====
After this practice session, the student
  * knows about linear regression as a possible tool to model V- and Q-functions;
  * understands in principle how approximative Q-learning works.

===== Program =====
  * Q/A
  * Discussion of the bonus quiz from the last week
  * Exercise 1: Approximation minimizing least squares error (LSQ)
  * Exercise 2: Approximative Q-learning
  * Introduction of the bonus quiz for this week


===== Exercise / Solving together =====

  * Approximation minimizing least squares error (LSQ)
  * Approximative Q-learning
  * {{:courses:be5b33kui:labs:weekly:learning_by_approximation.pdf |(see pdf)}}

===== Bonus quiz =====

  * Calculate state values during a random walk policy
  * 0.5 points
  * submit your solution to [[https://cw.felk.cvut.cz/brute/|BRUTE]] **lab10quiz**, deadline in BRUTE
  * format: text file, photo of your solution on paper, pdf - what is convenient for you
  * solution will be discussed on the next lab
  * Students with their family name starting from A to K (included) have to solve and upload {{ :courses:be5b33kui:labs:weekly:random_walk.pdf |subject A}} , while students with family name from L to Z have to solve and upload {{ :courses:be5b33kui:labs:weekly:random_walk.pdf |subject B}}.

> {{page>courses:be5b33kui:internal:quizzes#state_values_for_a_random_walk}}

===== Homework =====
  * Work on the [[courses:be5b33kui:semtasks:04_rl:start|Reinforcement learning assignment]].