Lectures are given in person. Videos will be provided, but not necessarily directly from the lectures (the form is at the discretion of the individual lecturers).

In the lecture descriptions below, we refer to this **supplementary course material**:

**Relevant**

- RL:
*R. S. Sutton, A. G. Barto: Reinforcement learning: An introduction. MIT press, 2018.* - NLP:
*D. Jurafsky & J. H. Martin: Speech and Language Processing - 3rd edition draft* - COLT:
*M. J. Kearns, U. Vazirani: An Introduction to Computational Learning Theory, MIT Press 1994*

RL & NLP are available online.

You are strongly discouraged from using this course's materials from previous years as you would run into confusions.

The RL part of the course is heavily based on the RL course of prof Emma Brunskill. The relevant lectures from prof Brunskill's course are: Lecture 1,Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 11.

The NLP part of the course is heavily based on the NLP course(s) from Dan Jurafsky (Stanford), following his book: Speech and Language Processing (see NLP above) - particularly its 3rd edition draft (2nd ed. is insufficient!). The relevant chapters for us are 3, 6, 7 and 9. There are also some nice related materials and videos

For the COLT part: besides the monograph by Kearns et al linked above, the Wikipedia page has pointers to two COLT survey papers (Angluin, Haussler) which are relevant to the PAC part. There are also external courses with lecture material available; for example, 8803 Machine Learning Theory at Georgia Tech covers all COLT topics of SMU (there are subtle differences in the algorithms and proofs). Video footage of the lectures available here.

**Slides:** Slides

**Videos:** Markov Processes, Markov Reward Processes, Markov Decision Processes, Proofs are not part of the videos - they are not a compulsory part, although they are in the slides and will be discussed in the lecture (time permitting* - update after the lecture: we did not have time to go over the proofs*).

**A Note about organization of the first six lectures on reinforcement learning:** *These first four lectures will be about reinforcement learning. You will have three different sources of materials to choose from:*

*1. Lectures given in-person on Mondays.*

*2. Videos that will be uploaded before each lecture on the course web page.*

*3. Course videos by Professor Emma Brunskill (the relevant lectures from her course are linked to above).*

*Option 1 is the traditional one. Option 2 is kind of a minimal version. I will try to make the videos concise and shorter than the lectures but I will still aim for them to cover all the important material - we will not use lecture videos from the last year, there will be new ones recorded. Option 3 will probably be most rewarding if you want to learn reinforcement learning in greater depth (and I recommend it - Prof. Brunskill’s course is great).*

*Now regarding the first lecture, it is a brief recap of Markov decision processes (MDPs) because not all students of the course have the same background from their previous studies. If you feel confident about your knowledge of MDPs, feel free to skip the lecture (but still come to the exercise session).*

**Note:** *This lecture is heavily based on a lecture by Prof Emma Brunskill (all potential errors are likely mine).*

**Relevant videos from Prof Brunskill's course:** Lecture 1,Lecture 2.

**Typos in the videos:** There is a typo on the slide 27 titled “Computing Value Function (2/3)”. There is a missing gamma by which the state transition matrix should be multiplied. This has been corrected in the slides but not in the video.

**Slides:** Slides

**Videos:** Short Recap, Problem Statement and Statistical Properties of Estimators, Monte-Carlo Policy Evaluation Methods, Temporal Difference Policy Evaluation Methods.

**Important:** *There is a typo on several of the slides in the video in the pseudocode of First-Visit and Every-Visit Monte-Carlo algorithms. On the line where G(s) is updated, there should be g_{i,t} instead of g_{i,1}. This has been fixed in the slides, but not in the videos.*

In this lecture, we will talk about model-free policy evaluation methods. We will end the lecture with a short discussion of model-free control with which we will start next week.

**Note:** *This lecture is heavily based on a lecture by Prof Emma Brunskill (all potential errors are likely mine). *

**Relevant videos from Prof Brunskill's course:** Lecture 3

**Slides:** Slides

**Videos:** Short Recap, Model-Free Control - Problem Statement, Monte-Carlo On Policy Iteration, SARSA, Q-Learning.

**Note:** *This lecture is heavily based on a lecture by Prof Emma Brunskill (all potential errors are likely mine). There was a typo in the pseudocode of SARSA - the action sampled inside the loop should be sampled from \pi(s_{t+1}) instead of \pi(s_t). The error remains in the video. *

**Relevant videos from Prof Brunskill's course:** Lecture 4

**Slides:** Slides

**Videos:** Intro + Short Recap, RL with Value Function Approximation - Problem Description, Some More Background (mostly SGD), Policy Approximation with VFA, Control with VFA.

**Note:** *This lecture is heavily based on a lecture by Prof Emma Brunskill (all potential errors are likely mine). *

**Relevant videos from Prof Brunskill's course:** Lecture 5, Lecture 6, Lecture 11.

**Slides:** Slides

**Videos:** Bandits. There is no video for the proofs part of the lecture, which is not compulsory material and will not appear in the exam.

**Note:**

**Relevant videos from Prof Brunskill's course:** Lecture 11.

**Slides:** Slides

**Videos:** Coming soon.

**Note:**

**Relevant videos from Prof Brunskill's course:** Lecture 12.

courses/smu/lectures.txt · Last modified: 2024/04/22 15:03 by souregus