13./14. Lets prepare for exam - Exercises
In this tutorial, we will go over several heuristics of classical planning and probabilistic planning as a preparation for exam. If you have your own exercises or you struggle with any topic, prepare it for the tutorial.
Delete relaxation heuristics
A relaxed planning task of STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where $a = \langle pre_a, add_a, del_a \rangle$ is a task $Pi^+ = \langle F, A^+, s_0, g \rangle$, where the delete effects of actions are omitted, i.e., $a^+ = \langle pre_{a^+}, add_{a^+}, \emptyset \rangle$.
The optimal heuristic of $\Pi$ is denoted $h^*$ and for $\Pi^+$. However, the heuristic values cannot be computed in polynomial time, therefore, we use estimates: $h^{max}, h^{add}, h^{ff}$. In this tutorial, we consider $\Pi = \Pi^+$ and $h^{max} \leq h^+ = h^* \leq h^{add}$ and $h^+ = h^* \leq h^{ff}$ hold.
Exercise: Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where
$F=\{a,b,c,d,e,f,g,h\}$,
$s_0 = \{a\}$,
$g = \{c,d,e,f,g\}$,
$A = \{a_1,a_2,a_3,a_4,a_5,a_6\}$ defined below:
action | pre | add | cost |
$a_1$ | $\{a\}$ | $\{b,c\}$ | $1$ |
$a_2$ | $\{a,c\}$ | $\{d\}$ | $1$ |
$a_3$ | $\{b,c\}$ | $\{e\}$ | $1$ |
$a_4$ | $\{b\}$ | $\{f\}$ | $1$ |
$a_5$ | $\{d\}$ | $\{e,f\}$ | $1$ |
$a_6$ | $\{d\}$ | $\{g\}$ | $1$ |
Compute the perfect heuristic $h^*$, $h^{max}$, $h^{add}$, and $h^{ff}$ for the initial state, i.e., $h^*(s_0), h^{max}(s_0),h^{add}(s_0)$ and $h^{ff}(s_0)$.
Solution
$s_0=\{a\} \stackrel{a_1}{\longrightarrow} \{a,b,c\} \stackrel{a_2}{\longrightarrow} \{a,b,c,d\} \stackrel{a_6}{\longrightarrow} \{a,b,c,d,g\} \stackrel{a_5}{\longrightarrow} \{a,b,c,d,e,f,g\} \subseteq g$
Thus $h^*(s_0)=\mathsf{cost}(a_1) + \mathsf{cost}(a_2) + \mathsf{cost}(a_6) + \mathsf{cost}(a_5)= 4$.
$h^{max}(s_0) = 3$
$h^{add}(s_0) = 11$
$h^{ff}(s_0) = 5$
LM-Cut Heuristic
Exercise: Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where
$F=\{a,b,c,d,e,f\}$,
$s_0 = \{c,d\}$,
$g = \{a,e\}$,
$A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:
action | pre | add | cost |
$a_1$ | $\{d\}$ | $\{a,e\}$ | $3$ |
$a_2$ | $\{c,d\}$ | $\{e\}$ | $1$ |
$a_3$ | $\{d,e\}$ | $\{a,b\}$ | $1$ |
$a_4$ | $\{b\}$ | $\{d,f\}$ | $1$ |
$a_5$ | $\{b,e\}$ | $\{f\}$ | $2$ |
Compute $h^{LM-Cut}(s_0)$.
Do not forget to extend $\Pi$ by the by new facts $\bot$ and $\top$ so the new initial state is $\{\bot\}$ and the new goal state is $\{\top\}$.
Solution - Extend $\Pi$
$F = \{a,b,c,d,e,f,\top,\bot\}, s_0 = \{\bot\}, g = \{\top\}$, and $A = A \cup \{a_{\bot}, a_{\top}\}$, where
action | pre | add | cost |
$a_{\bot}$ | $\bot$ | $\{c,d\}$ | $0$ |
$a_{\top}$ | ${\a,e\}$ | $\top$ | $0$ |
Solution - $1^{st}$-iteration
action | $a_{\bot}$ | $a_{1}$ | $a_{2}$ | $a_{3}$ | $a_{4}$ | $a_{5}$ | $a_{\top}$ |
$\mathsf{cost}$ | 0 | 3 | 1 | 1 | 1 | 2 | 0 |
| $\bot$ | $a$ | $b$ | $c$ | $d$ | $e$ | $f$ | $\top$ |
init | 0 | $\infty$ | $\infty$ | $\infty$ | $\infty$ | $\infty$ | $\infty$ | $\infty$ |
1 | 0 | $\infty$ | $\infty$ | 0 | 0 | $\infty$ | $\infty$ | $\infty$ |
2 | 0 | 3 | $\infty$ | 0 | 0 | 1 | $\infty$ | $\infty$ |
3 | 0 | 2 | 2 | 0 | 0 | 1 | $\infty$ | $\infty$ |
4 | 0 | 2 | 2 | 0 | 0 | 1 | $\infty$ | 2 |
5 | 0 | 2 | 2 | 0 | 0 | 1 | 3 | 2 |
action | $a_{\bot}$ | $a_{1}$ | $a_{2}$ | $a_{3}$ | $a_{4}$ | $a_{5}$ | $a_{\top}$ |
choice ($\textrm{argmax}_{p \in pre_a} \boldsymbol{\sigma}(p))$ | $\bot$ | $d$ | $c$ | $e$ | $b$ | $b$ | $e$ |
(If tie-break use and lexicographical order.)
From justification graph $\rightarrow L = \{a_1, a_2\}$ and $\textrm{min}\{a_1,a_2\} = 1$.
action | $a_{\bot}$ | $a_{1}$ | $a_{2}$ | $a_{3}$ | $a_{4}$ | $a_{5}$ | $a_{\top}$ |
$\mathsf{cost}$ | 2 | 0 | 1 | 1 | 1 | 2 | 0 |
Patterns and LP-based heuristics
Exercise 1: Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where
$V=\{v_a, v_b, v_c, v_d, v_e, v_f\}$, $dom(v_i) = \{0,1\}, \forall i \in \{a,b,c,d,e,f \}$,
$s_0 = \{v_a=0,v_b=0,v_c=1,v_d=1,v_e=0,v_f=0\}$,
$g = \{v_a=1,v_e=1\}$,
$A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:
action | pre | add | cost |
$a_1$ | $\{v_d=1\}$ | $\{v_a=1,v_e=1\}$ | $3$ |
$a_2$ | $\{v_c=1,v_d=1\}$ | $\{v_e=1\}$ | $1$ |
$a_3$ | $\{v_d=1,v_e=1\}$ | $\{v_a=1,v_b=1\}$ | $1$ |
$a_4$ | $\{v_b=1\}$ | $\{v_d=1,v_f=1\}$ | $1$ |
$a_5$ | $\{v_b=1,v_e=1\}$ | $\{v_f=1\}$ | $2$ |
Create a causal graph of $\Pi$, where vertices corresnponds to $V$ and edges to actions $a_k$, where for each $v_i,v_j \in V$ either $\mathsf{pre}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined or $\mathsf{eff}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined. And from the graph select interesting patterns up to size 2.
Solution - interesting patterns
Causal graph:
Interesting patterns of size 1 are $P_1 = \{ \{v_a\}, \{v_e\} \}$, since $g(v_a)$ and $g(v_e)$ are defined.
Interesting patterns of size 2 are $P_2 = \{ \{v_a,v_b\}, \{v_a, v_d\}, \{v_a,v_e\}, \{v_b,v_e\}, \{v_c,v_e\}, \{v_d,v_e\} \}$, where each contains variable $v_i$ for which $g(v_i)$ is defined.
Hence, all interesting patters $P_{\rm up 2}$ are $P_{\rm up 2} = P_1 \cup P_2$.
Take the interesting patterns $P$ and used them to form a linear program to compute post-hoc heuristic value.
Solution - post-hoc heuristic
$P \in P_{\rm up2}$ | $\{v_a\}$ | $\{v_e\}$ | $\{v_a,v_b\}$ | $\{v_a,v_d\}$ | $\{v_a,v_e\}$ | $\{v_b,v_e\}$ | $\{v_c,v_e\}$ | $\{v_d,v_e\}$ |
$h_P(s_0)$ | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 |
$P \in P_{\rm up2}$ | $\{v_a\}$ | $\{v_e\}$ | $\{v_a,v_b\}$ | $\{v_a,v_d\}$ | $\{v_a,v_e\}$ | $\{v_b,v_e\}$ | $\{v_c,v_e\}$ | $\{v_d,v_e\}$ |
$a_1$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
$a_2$ | ✘ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ |
$a_3$ | ✔ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ |
$a_4$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
$a_5$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
\[ \min 3X_1 + X_2 + X_3 + X_4 + 2X_5 \qquad\text{subject to} \]
\begin{align*}
X_3 \geq 1 \\
X_2 \geq 1 \\
X_3 \geq 1 \\
X_3 \geq 1 \\
X_2 + X_3\geq 2 \\
X_2 \geq 1 \\
X_2 \geq 1 \\
X_2 \geq 1
\end{align*}
Solution of LP is 3.
Exercise 2: Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where
$V=\{v_1, v_2, v_3\}$, $dom(v_1) = \{d,e\}, dom(v_2)=\{f,g\}, dom(v_3)=\{h,j,k\}$,
$s_0 = \{v_1=d,v_2=f,v_3=h\}$,
$g = \{v_1=d,v_3=k\}$,
$A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:
action | pre | add | cost |
$a_1$ | $v_1=d,v_3=h$ | $v_1=e,v_3=j$ | $2$ |
$a_2$ | $v_1=d$ | $v_2=g$ | $1$ |
$a_3$ | $v_2=g,v_3=j$ | $v_3=k$ | $1$ |
$a_4$ | $v_1=e$ | $v_1=d$ | $2$ |
$a_5$ | $v_3=h$ | $v_3=j$ | $5$ |
Given projection $P=\{v_3\}$, compute the heuristic $h_{P}(s_0)$ and post-hoc heuristic.
Solution - heuristic
$P=\{v_3\}$, $s_0 = \{v_3=h\}, g=\{v_3=k\}$, and $A$ is
action | pre | add | cost |
$a_1$ | $v_3=h$ | $v_3=j$ | $2$ |
$a_3$ | $v_3=j$ | $v_3=k$ | $1$ |
The constructed LTS is following $ \{v_3=h\} \stackrel{a_1}{\longrightarrow} \{v_3=j\} \stackrel{a_3}{\longrightarrow} \{v_3=k\}$. And $h_{P}(s_0) = \mathsf{cost}(a_1) + \mathsf{cost}(a_3) = 2 + 1 = 3$.
Solution - post-hoc heuristic
\[ \min 2X_1 + X_3 + 5X_5 \qquad\text{subject to} \]
\begin{align*}
2X_1 + X_3 + 5X_5 \geq 3
\end{align*}
The optimal value of LP is 3.
Stochastic shortest path (SSP)
Exercise: Wanting to read a book as soon as possible, we decide to go and get it. We can either take bus or bike to the city center, where there are a library as well as a shop. The problem is modelled as the following LTS.
At first, compute the heuristic values of each state $S = \{home, station, bus, library, shop, reading\}$.
Solution - heuristic values
state $s$ | $home$ | $station$ | $bus$ | $library$ | $shop$ | $reading$ |
$h(s)$ | 21 | 26 | 16 | 1 | 2 | 0 |
Then perform ILAO$^*$ and get optimal policy $\pi^*$.
Solution - ILAO$^*$
Recall two steps of ILAO$^*$:
Determine solution graph
Do Value Iteration until policy $\pi$ of any $s$ is changed (then create new solution graph) or the maximal change in Bellman update of $V$ is less than $\epsilon = 0.01$ (then terminate algorithm).
Value iteration of states in solution graph ($home$, $station$, $bus$, $library$, $reading$) according to $Q(s,a) = \textrm{min}_{a \in A} \mathsf{cost}(a) + \sum_{t \in T(s,a)} P(t|s,a) V(t)$.
| $home$ | $station$ | $bus$ | $library$ | $reading$ | $\pi(home)$ | $\pi(station)$ | $\pi(bus)$ | $\pi(library)$ | $\pi(reading)$ | Res($home$) | Res($station$) | Res($bus$) | Res($library$) | Res($reading$) |
$h(s)$ | 21 | 26 | 16 | 1 | 0 | - | - | - | - | - | - | - | - | - | - |
$V_1(s)$ | 24 | 27 | 16 | 1.4 | 0 | bus | wait | take | borrow | - | 3 | 1 | 0 | 0.4 | 0 |
$V_2(s)$ | 24.3 | 27.1 | 16.4 | 1.56 | 0 | bus | wait | take | borrow | - | 0.3 | 0.1 | 0.4 | 0.16 | 0 |
$V_3(s)$ | 24.61 | 27.47 | 16.56 | 1.624 | 0 | bus | wait | take | borrow | - | 0.31 | 0.37 | 0.16 | 0.064 | 0 |
… | | | | | | | | | | | | | | | |
$V_{\infty}(s)$ | 25 | 28 | 17 | 1.7 | 0 | bus | wait | take | borrow | - | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | 0 |
Optimal policy is $\pi^*(home) = go to bus$, $\pi^*(station) = wait$, $\pi^*(bus) = take bus$, $\pi^*(library) = borrow$.