===== 13./14. Lets prepare for exam - Exercises =====
In this tutorial, we will go over several heuristics of classical planning and probabilistic planning as a preparation for exam. If you have your own exercises or you struggle with any topic, prepare it for the tutorial.
==== Delete relaxation heuristics ====
A relaxed planning task of STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where $a = \langle pre_a, add_a, del_a \rangle$ is a task $Pi^+ = \langle F, A^+, s_0, g \rangle$, where the delete effects of actions are omitted, i.e., $a^+ = \langle pre_{a^+}, add_{a^+}, \emptyset \rangle$.
The optimal heuristic of $\Pi$ is denoted $h^*$ and for $\Pi^+$. However, the heuristic values cannot be computed in polynomial time, therefore, we use estimates: $h^{max}, h^{add}, h^{ff}$. In this tutorial, we consider $\Pi = \Pi^+$ and $h^{max} \leq h^+ = h^* \leq h^{add}$ and $h^+ = h^* \leq h^{ff}$ hold.
**Exercise:** Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where
* $F=\{a,b,c,d,e,f,g,h\}$,
* $s_0 = \{a\}$,
* $g = \{c,d,e,f,g\}$,
* $A = \{a_1,a_2,a_3,a_4,a_5,a_6\}$ defined below:
^ action ^ pre ^ add ^ cost ^
| $a_1$ | $\{a\}$ | $\{b,c\}$ | $1$ |
| $a_2$ | $\{a,c\}$ | $\{d\}$ | $1$ |
| $a_3$ | $\{b,c\}$ | $\{e\}$ | $1$ |
| $a_4$ | $\{b\}$ | $\{f\}$ | $1$ |
| $a_5$ | $\{d\}$ | $\{e,f\}$ | $1$ |
| $a_6$ | $\{d\}$ | $\{g\}$ | $1$ |
Compute the perfect heuristic $h^*$, $h^{max}$, $h^{add}$, and $h^{ff}$ for the initial state, i.e., $h^*(s_0), h^{max}(s_0),h^{add}(s_0)$ and $h^{ff}(s_0)$.
++++ Solution|
{{ :courses:pui:tutorials:solution.png?400 |}}
* The optimal $s_0$-plan is $\pi=(a_1, a_2, a_6, a_5)$ generating the following sequence of states:
$s_0=\{a\} \stackrel{a_1}{\longrightarrow} \{a,b,c\} \stackrel{a_2}{\longrightarrow} \{a,b,c,d\} \stackrel{a_6}{\longrightarrow} \{a,b,c,d,g\} \stackrel{a_5}{\longrightarrow} \{a,b,c,d,e,f,g\} \subseteq g$
Thus $h^*(s_0)=\mathsf{cost}(a_1) + \mathsf{cost}(a_2) + \mathsf{cost}(a_6) + \mathsf{cost}(a_5)= 4$.
* $h^{max}(s_0) = 3$
* $h^{add}(s_0) = 11$
* $h^{ff}(s_0) = 5$
++++
==== LM-Cut Heuristic ====
**Exercise:** Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where
* $F=\{a,b,c,d,e,f\}$,
* $s_0 = \{c,d\}$,
* $g = \{a,e\}$,
* $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:
^ action ^ pre ^ add ^ cost ^
| $a_1$ | $\{d\}$ | $\{a,e\}$ | $3$ |
| $a_2$ | $\{c,d\}$ | $\{e\}$ | $1$ |
| $a_3$ | $\{d,e\}$ | $\{a,b\}$ | $1$ |
| $a_4$ | $\{b\}$ | $\{d,f\}$ | $1$ |
| $a_5$ | $\{b,e\}$ | $\{f\}$ | $2$ |
Compute $h^{LM-Cut}(s_0)$.
Do not forget to extend $\Pi$ by the by new facts $\bot$ and $\top$ so the new initial state is $\{\bot\}$ and the new goal state is $\{\top\}$.
++++ Solution - Extend $\Pi$|
$F = \{a,b,c,d,e,f,\top,\bot\}, s_0 = \{\bot\}, g = \{\top\}$, and $A = A \cup \{a_{\bot}, a_{\top}\}$, where
^ action ^ pre ^ add ^ cost ^
| $a_{\bot}$ | $\bot$ | $\{c,d\}$ | $0$ |
| $a_{\top}$ | ${\a,e\}$ | $\top$ | $0$ |
++++
++++ Solution - $1^{st}$-iteration|
^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^
| $\mathsf{cost}$ | 0 | 3 | 1 | 1 | 1 |2| 0 |
* $h^{max}$: $h^{max}(\top) = 2$
^ ^ $\bot$ ^ $a$ ^ $b$^ $c$ ^ $d$ ^ $e$ ^ $f$ ^ $\top$ ^
| init | 0 | $\infty$ | $\infty$ | $\infty$| $\infty$ | $\infty$ | $\infty$ | $\infty$ |
| 1 | 0 | $\infty$ | $\infty$ | 0 | 0| $\infty$ | $\infty$ | $\infty$ |
| 2 | 0 | 3 | $\infty$ | 0 | 0| 1| $\infty$ | $\infty$ |
| 3 | 0 | 2 | 2 | 0 | 0| 1| $\infty$ | $\infty$ |
| 4 | 0 | 2 | 2 | 0 | 0| 1| $\infty$ | 2 |
| 5 | 0 | 2 | 2 | 0 | 0| 1|3 | 2 |
* landmarks $L$ from justification graph
^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^
| **choice** ($\textrm{argmax}_{p \in pre_a} \boldsymbol{\sigma}(p))$ | $\bot$ | $d$ | $c$ | $e$ | $b$ | $b$ | $e$ |
(If tie-break use and lexicographical order.)
From justification graph $\rightarrow L = \{a_1, a_2\}$ and $\textrm{min}\{a_1,a_2\} = 1$.
* Update action costs
^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^
| $\mathsf{cost}$ | 2 | 0 | 1 | 1 | 1 |2| 0 |
++++
/*
++++ Solution - $2^{st}$-iteration|
^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^
| $\mathsf{cost}$ | 0 | 2 | 0 | 0 | 0 |1| 0 |
* $h^{max}$: $h^{max}(\top) = 0 \rightarrow $ LM-Cut stops.
^ F ^ $\bot$ ^ $a$ ^ $b$^ $c$ ^ $d$ ^ $e$ ^ $f$ ^ $\top$ ^
| init | 0 | $\infty$ | $\infty$ | $\infty$| $\infty$ | $\infty$ | $\infty$ | $\infty$ |
| 1 | 0 | $\infty$ | $\infty$ | 0 | 0| $\infty$ | $\infty$ | $\infty$ |
| 2 | 0 | 2 | $\infty$ | 0 | 0| 0| $\infty$ | $\infty$ |
| 3 | 0 | 0 | 0 | 0 | 0| 0| $\infty$ | $\infty$ |
| 4 | 0 | 0 | 0 | 0 | 0| 0| $\infty$ | 0|
| 5 | 0 | 0 | 0 | 0 | 0| 0| 0|0 |
++++
*/
==== Patterns and LP-based heuristics ====
**Exercise 1:** Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where
* $V=\{v_a, v_b, v_c, v_d, v_e, v_f\}$, $dom(v_i) = \{0,1\}, \forall i \in \{a,b,c,d,e,f \}$,
* $s_0 = \{v_a=0,v_b=0,v_c=1,v_d=1,v_e=0,v_f=0\}$,
* $g = \{v_a=1,v_e=1\}$,
* $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:
^ action ^ pre ^ add ^ cost ^
| $a_1$ | $\{v_d=1\}$ | $\{v_a=1,v_e=1\}$ | $3$ |
| $a_2$ | $\{v_c=1,v_d=1\}$ | $\{v_e=1\}$ | $1$ |
| $a_3$ | $\{v_d=1,v_e=1\}$ | $\{v_a=1,v_b=1\}$ | $1$ |
| $a_4$ | $\{v_b=1\}$ | $\{v_d=1,v_f=1\}$ | $1$ |
| $a_5$ | $\{v_b=1,v_e=1\}$ | $\{v_f=1\}$ | $2$ |
Create a causal graph of $\Pi$, where vertices corresnponds to $V$ and edges to actions $a_k$, where for each $v_i,v_j \in V$ either $\mathsf{pre}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined or $\mathsf{eff}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined. And from the graph select interesting patterns up to size 2.
++++ Solution - interesting patterns |
Causal graph:
{{ :courses:pui:tutorials:causal_graph.png?200 |}}
Interesting patterns of size 1 are $P_1 = \{ \{v_a\}, \{v_e\} \}$, since $g(v_a)$ and $g(v_e)$ are defined.
Interesting patterns of size 2 are $P_2 = \{ \{v_a,v_b\}, \{v_a, v_d\}, \{v_a,v_e\}, \{v_b,v_e\}, \{v_c,v_e\}, \{v_d,v_e\} \}$, where each contains variable $v_i$ for which $g(v_i)$ is defined.
Hence, all interesting patters $P_{\rm up 2}$ are $P_{\rm up 2} = P_1 \cup P_2$.
++++
Take the interesting patterns $P$ and used them to form a linear program to compute post-hoc heuristic value.
++++ Solution - post-hoc heuristic |
* Compute pattern database heuristic $h_P(s_0)$ for each interesting pattern $P \in P_{\rm up 2}$ from LTS.
* LTS:
* $P = \{v_a\}$: 0 --(a3)--> 1
* $P = \{v_e\}$: 0 --(a2)--> 1
* $P = \{v_a,v_b\}$:
00 -(a1)-> 10
|
(a3)
|
v
11
* $P = \{v_a,v_d\}$: 01 -(a3)-> 11
* $P = \{v_a,v_e\}$:
00 -(a2)-> 01 -(a3)-> 11
| ^
| |
--------(a1)------
* $P = \{v_b,v_e\}$: 00 -(a2)-> 01
* $P= \{v_c,v_e \}$: 00 -(a1) -> 01
10 -(a2)-> 11
* $P = \{v_d,v_e\}$: 00 -(a4)-> 10 -(a2)-> 11
* Heuristic:
^ $P \in P_{\rm up2}$ ^ $\{v_a\}$ ^ $\{v_e\}$ ^ $\{v_a,v_b\}$ ^ $\{v_a,v_d\}$ ^ $\{v_a,v_e\}$ ^ $\{v_b,v_e\}$ ^ $\{v_c,v_e\}$ ^ $\{v_d,v_e\}$ ^
| $h_P(s_0)$ | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 |
* What actions affect $P_{\rm up2}$?
^ $P \in P_{\rm up2}$ ^ $\{v_a\}$ ^ $\{v_e\}$ ^ $\{v_a,v_b\}$ ^ $\{v_a,v_d\}$ ^ $\{v_a,v_e\}$ ^ $\{v_b,v_e\}$ ^ $\{v_c,v_e\}$ ^ $\{v_d,v_e\}$ ^
| $a_1$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
| $a_2$ | ✘ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ |
| $a_3$ | ✔ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ |
| $a_4$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
| $a_5$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
*Create LP and compute its optimal value.
\[ \min 3X_1 + X_2 + X_3 + X_4 + 2X_5 \qquad\text{subject to} \]
\begin{align*}
X_3 \geq 1 \\
X_2 \geq 1 \\
X_3 \geq 1 \\
X_3 \geq 1 \\
X_2 + X_3\geq 2 \\
X_2 \geq 1 \\
X_2 \geq 1 \\
X_2 \geq 1
\end{align*}
Solution of LP is 3.
++++
**Exercise 2:** Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where
* $V=\{v_1, v_2, v_3\}$, $dom(v_1) = \{d,e\}, dom(v_2)=\{f,g\}, dom(v_3)=\{h,j,k\}$,
* $s_0 = \{v_1=d,v_2=f,v_3=h\}$,
* $g = \{v_1=d,v_3=k\}$,
* $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:
^ action ^ pre ^ add ^ cost ^
| $a_1$ | $v_1=d,v_3=h$ | $v_1=e,v_3=j$ | $2$ |
| $a_2$ | $v_1=d$ | $v_2=g$ | $1$ |
| $a_3$ | $v_2=g,v_3=j$ | $v_3=k$ | $1$ |
| $a_4$ | $v_1=e$ | $v_1=d$ | $2$ |
| $a_5$ | $v_3=h$ | $v_3=j$ | $5$ |
Given projection $P=\{v_3\}$, compute the heuristic $h_{P}(s_0)$ and post-hoc heuristic.
++++ Solution - heuristic|
$P=\{v_3\}$, $s_0 = \{v_3=h\}, g=\{v_3=k\}$, and $A$ is
^ action ^ pre ^ add ^ cost ^
| $a_1$ | $v_3=h$ | $v_3=j$ | $2$ |
| $a_3$ | $v_3=j$ | $v_3=k$ | $1$ |
The constructed LTS is following $ \{v_3=h\} \stackrel{a_1}{\longrightarrow} \{v_3=j\} \stackrel{a_3}{\longrightarrow} \{v_3=k\}$. And $h_{P}(s_0) = \mathsf{cost}(a_1) + \mathsf{cost}(a_3) = 2 + 1 = 3$.
++++
++++ Solution - post-hoc heuristic |
* For $P \in \{\{v_3\}\}$, compute $h_{P}(s_0)$: $h_{\{v_3\}}(s_0) = 3$
* What actions affect $P$? $A_P = \{a_1,a_3,a_5\}$
* Create linear program, where variables correspond to $A_P$: $X_1, X_3, X_5$.
\[ \min 2X_1 + X_3 + 5X_5 \qquad\text{subject to} \]
\begin{align*}
2X_1 + X_3 + 5X_5 \geq 3
\end{align*}
The optimal value of LP is 3.
++++
==== Stochastic shortest path (SSP) ====
**Exercise:** Wanting to read a book as soon as possible, we decide to go and get it. We can either take bus or bike to the city center, where there are a library as well as a shop. The problem is modelled as the following LTS.
{{ :courses:pui:tutorials:ssp2.png?400 |}}
At first, compute the heuristic values of each state $S = \{home, station, bus, library, shop, reading\}$.
++++ Solution - heuristic values |
^ state $s$ ^ $home$ ^ $station$ ^ $bus$ ^ $library$ ^ $shop$ ^ $reading$ ^
| $h(s)$ | 21 | 26 | 16 | 1 | 2 | 0 |
++++
Then perform ILAO$^*$ and get optimal policy $\pi^*$.
++++ Solution - ILAO$^*$ |
Recall two steps of ILAO$^*$:
- Determine solution graph
- Do Value Iteration until policy $\pi$ of any $s$ is changed (then create new solution graph) or the maximal change in Bellman update of $V$ is less than $\epsilon = 0.01$ (then terminate algorithm).
* Solution graph:
{{ :courses:pui:tutorials:ssp-sg.png?400 |}}
* Value iteration of states in solution graph ($home$, $station$, $bus$, $library$, $reading$) according to $Q(s,a) = \textrm{min}_{a \in A} \mathsf{cost}(a) + \sum_{t \in T(s,a)} P(t|s,a) V(t)$.
^ ^ $home$ ^ $station$ ^ $bus$ ^ $library$ ^ $reading$ ^ $\pi(home)$ ^ $\pi(station)$ ^ $\pi(bus)$ ^ $\pi(library)$ ^ $\pi(reading)$ ^ Res($home$) ^ Res($station$) ^ Res($bus$) ^ Res($library$) ^ Res($reading$) ^
| $h(s)$ | 21 | 26 | 16 | 1 | 0 | - | - | - | -| - | - | -| - | - | - |
| $V_1(s)$ | 24 | 27 | 16 | 1.4 | 0 | bus | wait | take | borrow | - |**3** | 1 | 0 | 0.4 | 0 |
| $V_2(s)$ | 24.3 | 27.1 | 16.4 | 1.56 | 0 | bus | wait | take | borrow | - |0.3 | 0.1 | **0.4** | 0.16 | 0 |
| $V_3(s)$ | 24.61 | 27.47 | 16.56 | 1.624 | 0 | bus | wait | take | borrow | - |0.31 | **0.37** | 0.16 | 0.064 | 0 |
| ... |
| $V_{\infty}(s)$ | 25 | 28 | 17| 1.7 | 0 | bus | wait | take | borrow | - | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | 0 |
Optimal policy is $\pi^*(home) = go to bus$, $\pi^*(station) = wait$, $\pi^*(bus) = take bus$, $\pi^*(library) = borrow$.
++++