# Tutorials

The theoretical problems used as exercises in the tutorials are similar to those that will be used in the final exam. They are also presented in exactly the same format as will be used in the exam including the point allocation (1 point corresponds to 2 minutes of solving time for an A-grade student).

## Tutorial 1

The purpose of this tutorial is to get a good understanding of the basic concepts introduced in Lecture 1.

We first solve two theoretical problems where the utility $U$ is computed from the description of the environment and the agent's actions. The first problem will be worked out by the teacher, the second by the students. If not solved in the allotted time, the exercise is your homework to be completed before the next tutorial, when solutions will be published.

Then we will play with the concepts through coding exercises. Download the exercise code, unzip it and upload to Google Colab. You may try to use a different Jupyter Notebook server instead, however, the code uses a library compiled for Colab, which will likely not work on your architecture. We will try different strategies for exploring the environment and exploiting the results of the exploration. The solution to the first assignment will be demonstrated by the teacher, the rest is to be done by the students. If not solved in the allotted time, the exercise is your homework to be completed before the next tutorial, when solutions will be published.

## Tutorial 2

The purpose of this tutorial is to get an insight into concept learning in the mistake-bound (aka online) learning framework.

In the theoretical part, we will try to solve 5 problems regarding classification, concept learning and logical generalization. Questions 2-4 will be demonstrated by the teacher while Q1 and Q5 will be solved by the students. If not solved in the allotted time, the two problems are your homework to be completed before the next tutorial, when solutions will be published.

In the coding exercises, we will explore two concept-learning algorithms: Winnow for learning monotone disjunctions with observations capturing the complete set of attribute values, and the generalization algorithm for learning conjunctions with possibly incomplete observations corresponding to conjunctions. You will be asked to adapt Winnow for learning monotone conjunctions either with all positive or all negative literals. If not solved in the allotted time, the exercise is your homework to be completed before the next tutorial, when solutions will be published. Optionally, you may also try to implement basis expansion so that Winnow can be used to learn non-monotone concepts, and adapt the generalization algorithm for learning disjunctions. The exercise code is to be handled just like last week.

## Tutorial 3

In this tutorial we will solve theoretical exercises related to the lecture material and the first project assignment will be presented. After reviewing the solutions to last week's exercise problems, we will try–with a little help from the teacher—to solve problems 2, 4 out of this problem list. The list contains one theoretical problem regarding the general framework and 4 other problems related to the mistake-bound learning model, and to two algorithms (Winnow and generalization) under the mistake-bound model assumptions. Questions 1,3,5 are your homework to be completed before the next tutorial, when solutions will be published.

## Tutorial 4

After reviewing the solutions to last week's exercise problems we will work on 5 exercise problems related to Lecture 4 material. Solutions to problems 1 and 2 will be demonstrated by the teacher. The students should make sure to fully understand these two solutions as otherwise, lack of requisite knowledge on logic may be indicated (in that case, please study the first two chapters of ILP). Question 5 will be tackled by the students first independently, and then the solution will be revealed. Questions 3, 4 will be left for independent work; if not solved in the allotted time, they are your homework to be completed before the next tutorial, when solutions will be published.

## Tutorial 5

After reviewing the solutions to last week's exercise problems we will be concerned with problems related to determining a (relative) least general generalization of two FOL clauses, proving a property of relative subsumption, proving the relative equivalence of two FOL clauses we encountered in Lecture 5, and finally proving the mistake bound for learning size-bounded FOL CNF's. Questions 1-3 will be first tackled by the students and then explained by the teacher. Question 4 is similar to an example problem shown in Lecture 5 and will be left as your homework. The solution to Question 5 will be demonstrated by the teacher.

## Tutorial 6

In this tutorial we will solve theoretical exercises related to the lecture material and the second project assignment will be presented. After reviewing the solutions to Question 4 from last week, we will be concerned with exercise problems related to the version-space learning strategy, to a property of the standard learning agent important for proving that mistake bound learning implies PAC-learning, to the necessity of consistent learning for PAC-learning, and to the Vapnik-Chervonenkis dimension. Q1 and Q2 will be first tackled by the students and reviewed by the teacher, Q3 will be demonstrated by the teacher and Q4 will be your homework.

## Tutorial 7

After reviewing the solution to Q4 from last week, we will hold a practical tutorial meant to get more grasp on some concepts defined in Lectures 6 and 7 but also to demonstrate some practical techniques not covered in the lectures. We will use an implementation of an algorithm for learning decision trees, which we defined for Boolean-tuple observations and see how it is applied when the features of the i.i.d. examples are real numbers. By changing the maximum depth of the learned tree $h$, we will control the size of the hypothesis class $\mathcal{H}$ of trees. We first verify that lower training error $\widehat{\text{err}}(h)$ can be achieved with a larger $\mathcal{H}$. Then we will look at two techniques (train-test split and cross-validation) to compute an estimate of the error ${\text{err}}(h)$, which–unlike $\widehat{\text{err}}(h)$–is unbiased. We will also generate the learning curve to confirm that a larger training set leads to a smaller error ${\text{err}}(h)$. Finally, we will investigate the discrepancy between $\widehat{\text{err}}(h)$ and ${\text{err}}(h)$ as a function of $|\mathcal{H}|$ and the size of the training set to confirm that it grows with the former and decays with the latter.

The exercise code is to be un-zipped and uploaded to Google Colab.

## Tutorial 8

This is a practical tutorial aimed at getting familiar with the elements of Bayes Networks and the computations they involve. We will look at a simple implementation of BN's encompassing methods for d-separation checking and for inferring joint and conditional probabilities. We will decide two cases of purported conditional independence, each via three approaches: by the definition of BN, by the definition of d-separation, and by the moralization algorithm, which is essentially an efficient d-separation checker. Afterwards, we use our BN implementation for exemplary probabilistic queries and the students will then work on an improvement of the method for joint probability computation. If not finished during the class, it will be your homework.

The exercise code is to be un-zipped and uploaded to Google Colab.

## Tutorial 9

We will solve 5 problems to cement our understanding of the semantics of Bayes network and our ability to infer conditional independence relations from them with or without employing the d-separation method. In the last problem, we will employ the factor-based method (also termed 'variable elimination' method although the latter is only one of the operations applied in the process) to infer conditional probability and to infer the joint Maximum Aposteriori Probability (MAP) state. The students will try to solve the problems on their own and then the solutions will be demonstrated by the teacher. If we finish only a part of the problem set in the tutorial time, you will finish the remainder as homework and the solutions to it will be revealed next week.

This code for Google Colab can be used to check numerically the solutions to the last problem in the set. It uses the simple Bayes Network class presented in the last tutorial.

## Tutorial 10

In the first reinforcement learning tutorial, we will start with a refresher of the mathematical notation connected with Markov decision processes (MDP). First, we will look at some practical applications of reinforcement learning. While explaining the concepts, we will answer questions that will help our understanding of the discount factor, compare reinforcement learning to planning and game theory fields. Further, we compare different notations and emphasize the connection of algorithms for solving MDPs to other well-known algorithms.

In the second half of the tutorial, we will present the third homework assignment, in which we use the SARSA algorithm to find a policy in the blackjack environment.

You may find the problems we are about to solve in this file. Slides for the tutorial are here.

## Tutorial 11

In this tutorial, you will try to implement a passive reinforcement learning agent. However, to better understand the algorithm itself, it is best to assume nothing about the environment. Therefore, I won't tell you how the environment works, nor the effects of your actions.

The agent will be based on temporal-difference learning. First, we will start with a draft implementation with a constant learning rate $\alpha$, next, we will replace this constant value with an appropriate function of the number of visits of a state so that we are guaranteed convergence of the learning. We will use several figures to show the influence of learning rate values on the convergence of a state utility function. We will also point out several interesting events that may happen during training.

The Python notebook we will use can be uploaded to Google Colab or used in the local environment. Visit also the file with problems to solve and their solutions.

## Tutorial 12

This tutorial will consist of problems to solve by hand. First, we start by reviewing active reinforcement learning algorithms as SARSA and $Q$-learning. Then, we will revisit passive reinforcement learning and show how it works on concrete examples. We will deal with convergence conditions of SARSA and $Q$-learning and show an interaction of the $Q$-learning algorithm with the environment. For self-study, a set of problems dealing with the explore-exploit policy is available.

See the set of problems to solve and their solutions.

## Tutorial 13

Reference solutions and FAQs regarding the two concluded projects (concept learning and FOL).