Lectures are given in person. Videos will be provided, but not necessarily directly from the lectures (the form is at the discretion of the individual lecturers).

In the lecture descriptions below, we refer to this **supplementary course material**:

**Relevant**

- RL:
*R. S. Sutton, A. G. Barto: Reinforcement learning: An introduction. MIT press, 2018.* - NLP:
*D. Jurafsky & J. H. Martin: Speech and Language Processing - 3rd edition draft* - COLT:
*M. J. Kearns, U. Vazirani: An Introduction to Computational Learning Theory, MIT Press 1994*

RL & NLP are available online.

You are strongly discouraged from using this course's materials from previous years as you would run into confusions.

The RL part of the course is heavily based on the RL course of prof Emma Brunskill. The relevant lectures from prof Brunskill's course are: Lecture 1,Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 11.

There are nice materials by Volodymyr Kuleshov and Stefano Ermon on probabilistic graphical models (for the Bayesian networks part of the course): https://ermongroup.github.io/cs228-notes/. The relevant chapters are: https://ermongroup.github.io/cs228-notes/representation/directed/, https://ermongroup.github.io/cs228-notes/inference/ve/, https://ermongroup.github.io/cs228-notes/inference/sampling/.

The NLP part of the course is heavily based on the NLP course(s) from Dan Jurafsky (Stanford), following his book: Speech and Language Processing (see NLP above) - particularly its 3rd edition draft (2nd ed. is insufficient!). The relevant chapters for us are 3, 6, 7 and 9. There are also some nice related materials and videos

For the COLT part: besides the monograph by Kearns et al linked above, the Wikipedia page has pointers to two COLT survey papers (Angluin, Haussler) which are relevant to the PAC part. There are also external courses with lecture material available; for example, 8803 Machine Learning Theory at Georgia Tech covers all COLT topics of SMU (there are subtle differences in the algorithms and proofs). Video footage of the lectures available here.

**Slides:** Slides

**Videos:** Markov Processes, Markov Reward Processes, Markov Decision Processes, Proofs are not part of the videos - they are not a compulsory part, although they are in the slides and will be discussed in the lecture (time permitting* - update after the lecture: we did not have time to go over the proofs*).

**A Note about organization of the first four lectures on reinforcement learning:** *These first four lectures will be about reinforcement learning. You will have three different sources of materials to choose from:*

*1. Lectures given in-person on Mondays.*

*2. Videos that will be uploaded before each lecture on the course web page.*

*3. Course videos by Professor Emma Brunskill (the relevant lectures from her course are linked to above).*

*Option 1 is the traditional one. Option 2 is kind of a minimal version. I will try to make the videos concise and shorter than the lectures but I will still aim for them to cover all the important material - we will not use lecture videos from the last year, there will be new ones recorded. Option 3 will probably be most rewarding if you want to learn reinforcement learning in greater depth (and I recommend it - Prof. Brunskill’s course is great).*

*Now regarding the first lecture, it is a brief recap of Markov decision processes (MDPs) because not all students of the course have the same background from their previous studies. If you feel confident about your knowledge of MDPs, feel free to skip the lecture (but still come to the exercise session).*

**Note:** *This lecture is heavily based on a lecture by Prof Emma Brunskill (all potential errors are likely mine).*

**Relevant videos from Prof Brunskill's course:** Lecture 1,Lecture 2.

**Slides:** Slides

**Videos:** Short Recap, Problem Statement and Statistical Properties of Estimators, Monte-Carlo Policy Evaluation Methods, Temporal Difference Policy Evaluation Methods.

**Important:** *There is a typo on several of the slides in the video in the pseudocode of First-Visit and Every-Visit Monte-Carlo algorithms. On the line where G(s) is updated, there should be g_{i,t} instead of g_{i,1}. This has been fixed in the slides, but not in the videos.*

In this lecture, we will talk about model-free policy evaluation methods. We will end the lecture with a short discussion of model-free control with which we will start next week.

**Note:** *This lecture is heavily based on a lecture by Prof Emma Brunskill (all potential errors are likely mine). *

**Relevant videos from Prof Brunskill's course:** Lecture 3

**After-lecture notes: ** One of you (sorry I do not know the name, but thank you for the comment!), pointed out that the computation used on the slides where we show that Every-Visit Monte-Carlo is biased needs to take into account the fact that episodes with more occurrences of the state 1 contribute more to the estimate. That is true! However, the example shown in the lecture shows that Every-Visit Monte-Carlo is biased for a single episode (that is already enough to show biasedness!). In general, here, the bias depends on the number of episodes that we use - that is actually not uncommon in statistics (think of variance estimation with the term 1/N instead of 1/(N-1)). Back to our problem from the slide… If you wanted to compute the bias of the estimator when using N episodes, you can do it (but it becomes a bit complicated): you can draw N copies of the state one connected as follows 1 → 1' → 1'' → … → 1^(N) → 0, with probability of moving from one state to another equal to p. Note that this approach only works because, in this example, we are using gamma=1.

**Slides:** Slides

**Videos:** Short Recap, Model-Free Control - Problem Statement, Monte-Carlo On Policy Iteration, SARSA, Q-Learning.

**Note:** *This lecture is heavily based on a lecture by Prof Emma Brunskill (all potential errors are likely mine). *

**Relevant videos from Prof Brunskill's course:** Lecture 4

**Slides:** Slides

**Videos:** Intro + Short Recap, RL with Value Function Approximation - Problem Description, Some More Background (mostly SGD), Policy Approximation with VFA, Control with VFA, Bandits.

**Note:**

**Relevant videos from Prof Brunskill's course:** Lecture 5, Lecture 6, Lecture 11.

**Slides:** slides

**Video:** Video (**Tip:** you may want to watch this video with increased speed, e.g., 1.25x or even 1.5x.)

**Relevant additional materials for the BN part:** materials by Volodymyr Kuleshov and Stefano Ermon on probabilistic graphical models (https://ermongroup.github.io/cs228-notes/). The relevant chapter for BN 1 is: https://ermongroup.github.io/cs228-notes/representation/directed/.

**Slides:** slides

**Video:** Recap + Variable Elimination, Forward Sampling and Rejection Sampling, Importance Sampling (sorry for worse technical quality in this one)

**Relevant additional materials for the BN part:** materials by Volodymyr Kuleshov and Stefano Ermon on probabilistic graphical models (https://ermongroup.github.io/cs228-notes/). The relevant chapters for BN 2 are: https://ermongroup.github.io/cs228-notes/inference/ve/, https://ermongroup.github.io/cs228-notes/inference/sampling/.

courses/smu/lectures.txt · Last modified: 2023/05/09 10:57 by souregus