===== 13./14. Lets prepare for exam - Exercises ===== In this tutorial, we will go over several heuristics of classical planning and probabilistic planning as a preparation for exam. If you have your own exercises or you struggle with any topic, prepare it for the tutorial. ==== Delete relaxation heuristics ==== A relaxed planning task of STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where $a = \langle pre_a, add_a, del_a \rangle$ is a task $Pi^+ = \langle F, A^+, s_0, g \rangle$, where the delete effects of actions are omitted, i.e., $a^+ = \langle pre_{a^+}, add_{a^+}, \emptyset \rangle$. The optimal heuristic of $\Pi$ is denoted $h^*$ and for $\Pi^+$. However, the heuristic values cannot be computed in polynomial time, therefore, we use estimates: $h^{max}, h^{add}, h^{ff}$. In this tutorial, we consider $\Pi = \Pi^+$ and $h^{max} \leq h^+ = h^* \leq h^{add}$ and $h^+ = h^* \leq h^{ff}$ hold. **Exercise:** Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where * $F=\{a,b,c,d,e,f,g,h\}$, * $s_0 = \{a\}$, * $g = \{c,d,e,f,g\}$, * $A = \{a_1,a_2,a_3,a_4,a_5,a_6\}$ defined below: ^ action ^ pre ^ add ^ cost ^ | $a_1$ | $\{a\}$ | $\{b,c\}$ | $1$ | | $a_2$ | $\{a,c\}$ | $\{d\}$ | $1$ | | $a_3$ | $\{b,c\}$ | $\{e\}$ | $1$ | | $a_4$ | $\{b\}$ | $\{f\}$ | $1$ | | $a_5$ | $\{d\}$ | $\{e,f\}$ | $1$ | | $a_6$ | $\{d\}$ | $\{g\}$ | $1$ | Compute the perfect heuristic $h^*$, $h^{max}$, $h^{add}$, and $h^{ff}$ for the initial state, i.e., $h^*(s_0), h^{max}(s_0),h^{add}(s_0)$ and $h^{ff}(s_0)$. ++++ Solution| {{ :courses:pui:tutorials:solution.png?400 |}} * The optimal $s_0$-plan is $\pi=(a_1, a_2, a_6, a_5)$ generating the following sequence of states: $s_0=\{a\} \stackrel{a_1}{\longrightarrow} \{a,b,c\} \stackrel{a_2}{\longrightarrow} \{a,b,c,d\} \stackrel{a_6}{\longrightarrow} \{a,b,c,d,g\} \stackrel{a_5}{\longrightarrow} \{a,b,c,d,e,f,g\} \subseteq g$ Thus $h^*(s_0)=\mathsf{cost}(a_1) + \mathsf{cost}(a_2) + \mathsf{cost}(a_6) + \mathsf{cost}(a_5)= 4$. * $h^{max}(s_0) = 3$ * $h^{add}(s_0) = 11$ * $h^{ff}(s_0) = 5$ ++++ ==== LM-Cut Heuristic ==== **Exercise:** Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where * $F=\{a,b,c,d,e,f\}$, * $s_0 = \{c,d\}$, * $g = \{a,e\}$, * $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below: ^ action ^ pre ^ add ^ cost ^ | $a_1$ | $\{d\}$ | $\{a,e\}$ | $3$ | | $a_2$ | $\{c,d\}$ | $\{e\}$ | $1$ | | $a_3$ | $\{d,e\}$ | $\{a,b\}$ | $1$ | | $a_4$ | $\{b\}$ | $\{d,f\}$ | $1$ | | $a_5$ | $\{b,e\}$ | $\{f\}$ | $2$ | Compute $h^{LM-Cut}(s_0)$. Do not forget to extend $\Pi$ by the by new facts $\bot$ and $\top$ so the new initial state is $\{\bot\}$ and the new goal state is $\{\top\}$. ++++ Solution - Extend $\Pi$| $F = \{a,b,c,d,e,f,\top,\bot\}, s_0 = \{\bot\}, g = \{\top\}$, and $A = A \cup \{a_{\bot}, a_{\top}\}$, where ^ action ^ pre ^ add ^ cost ^ | $a_{\bot}$ | $\bot$ | $\{c,d\}$ | $0$ | | $a_{\top}$ | ${\a,e\}$ | $\top$ | $0$ | ++++ ++++ Solution - $1^{st}$-iteration| ^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ | $\mathsf{cost}$ | 0 | 3 | 1 | 1 | 1 |2| 0 | * $h^{max}$: $h^{max}(\top) = 2$ ^ ^ $\bot$ ^ $a$ ^ $b$^ $c$ ^ $d$ ^ $e$ ^ $f$ ^ $\top$ ^ | init | 0 | $\infty$ | $\infty$ | $\infty$| $\infty$ | $\infty$ | $\infty$ | $\infty$ | | 1 | 0 | $\infty$ | $\infty$ | 0 | 0| $\infty$ | $\infty$ | $\infty$ | | 2 | 0 | 3 | $\infty$ | 0 | 0| 1| $\infty$ | $\infty$ | | 3 | 0 | 2 | 2 | 0 | 0| 1| $\infty$ | $\infty$ | | 4 | 0 | 2 | 2 | 0 | 0| 1| $\infty$ | 2 | | 5 | 0 | 2 | 2 | 0 | 0| 1|3 | 2 | * landmarks $L$ from justification graph ^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ | **choice** ($\textrm{argmax}_{p \in pre_a} \boldsymbol{\sigma}(p))$ | $\bot$ | $d$ | $c$ | $e$ | $b$ | $b$ | $e$ | (If tie-break use and lexicographical order.) From justification graph $\rightarrow L = \{a_1, a_2\}$ and $\textrm{min}\{a_1,a_2\} = 1$. * Update action costs ^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ | $\mathsf{cost}$ | 2 | 0 | 1 | 1 | 1 |2| 0 | ++++ /* ++++ Solution - $2^{st}$-iteration| ^ action ^ $a_{\bot}$ ^ $a_{1}$ ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ | $\mathsf{cost}$ | 0 | 2 | 0 | 0 | 0 |1| 0 | * $h^{max}$: $h^{max}(\top) = 0 \rightarrow $ LM-Cut stops. ^ F ^ $\bot$ ^ $a$ ^ $b$^ $c$ ^ $d$ ^ $e$ ^ $f$ ^ $\top$ ^ | init | 0 | $\infty$ | $\infty$ | $\infty$| $\infty$ | $\infty$ | $\infty$ | $\infty$ | | 1 | 0 | $\infty$ | $\infty$ | 0 | 0| $\infty$ | $\infty$ | $\infty$ | | 2 | 0 | 2 | $\infty$ | 0 | 0| 0| $\infty$ | $\infty$ | | 3 | 0 | 0 | 0 | 0 | 0| 0| $\infty$ | $\infty$ | | 4 | 0 | 0 | 0 | 0 | 0| 0| $\infty$ | 0| | 5 | 0 | 0 | 0 | 0 | 0| 0| 0|0 | ++++ */ ==== Patterns and LP-based heuristics ==== **Exercise 1:** Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where * $V=\{v_a, v_b, v_c, v_d, v_e, v_f\}$, $dom(v_i) = \{0,1\}, \forall i \in \{a,b,c,d,e,f \}$, * $s_0 = \{v_a=0,v_b=0,v_c=1,v_d=1,v_e=0,v_f=0\}$, * $g = \{v_a=1,v_e=1\}$, * $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below: ^ action ^ pre ^ add ^ cost ^ | $a_1$ | $\{v_d=1\}$ | $\{v_a=1,v_e=1\}$ | $3$ | | $a_2$ | $\{v_c=1,v_d=1\}$ | $\{v_e=1\}$ | $1$ | | $a_3$ | $\{v_d=1,v_e=1\}$ | $\{v_a=1,v_b=1\}$ | $1$ | | $a_4$ | $\{v_b=1\}$ | $\{v_d=1,v_f=1\}$ | $1$ | | $a_5$ | $\{v_b=1,v_e=1\}$ | $\{v_f=1\}$ | $2$ | Create a causal graph of $\Pi$, where vertices corresnponds to $V$ and edges to actions $a_k$, where for each $v_i,v_j \in V$ either $\mathsf{pre}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined or $\mathsf{eff}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined. And from the graph select interesting patterns up to size 2. ++++ Solution - interesting patterns | Causal graph: {{ :courses:pui:tutorials:causal_graph.png?200 |}} Interesting patterns of size 1 are $P_1 = \{ \{v_a\}, \{v_e\} \}$, since $g(v_a)$ and $g(v_e)$ are defined. Interesting patterns of size 2 are $P_2 = \{ \{v_a,v_b\}, \{v_a, v_d\}, \{v_a,v_e\}, \{v_b,v_e\}, \{v_c,v_e\}, \{v_d,v_e\} \}$, where each contains variable $v_i$ for which $g(v_i)$ is defined. Hence, all interesting patters $P_{\rm up 2}$ are $P_{\rm up 2} = P_1 \cup P_2$. ++++ Take the interesting patterns $P$ and used them to form a linear program to compute post-hoc heuristic value. ++++ Solution - post-hoc heuristic | * Compute pattern database heuristic $h_P(s_0)$ for each interesting pattern $P \in P_{\rm up 2}$ from LTS. * LTS: * $P = \{v_a\}$: 0 --(a3)--> 1 * $P = \{v_e\}$: 0 --(a2)--> 1 * $P = \{v_a,v_b\}$: 00 -(a1)-> 10 | (a3) | v 11 * $P = \{v_a,v_d\}$: 01 -(a3)-> 11 * $P = \{v_a,v_e\}$: 00 -(a2)-> 01 -(a3)-> 11 | ^ | | --------(a1)------ * $P = \{v_b,v_e\}$: 00 -(a2)-> 01 * $P= \{v_c,v_e \}$: 00 -(a1) -> 01 10 -(a2)-> 11 * $P = \{v_d,v_e\}$: 00 -(a4)-> 10 -(a2)-> 11 * Heuristic: ^ $P \in P_{\rm up2}$ ^ $\{v_a\}$ ^ $\{v_e\}$ ^ $\{v_a,v_b\}$ ^ $\{v_a,v_d\}$ ^ $\{v_a,v_e\}$ ^ $\{v_b,v_e\}$ ^ $\{v_c,v_e\}$ ^ $\{v_d,v_e\}$ ^ | $h_P(s_0)$ | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | * What actions affect $P_{\rm up2}$? ^ $P \in P_{\rm up2}$ ^ $\{v_a\}$ ^ $\{v_e\}$ ^ $\{v_a,v_b\}$ ^ $\{v_a,v_d\}$ ^ $\{v_a,v_e\}$ ^ $\{v_b,v_e\}$ ^ $\{v_c,v_e\}$ ^ $\{v_d,v_e\}$ ^ | $a_1$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | | $a_2$ | ✘ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | | $a_3$ | ✔ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | | $a_4$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | | $a_5$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | *Create LP and compute its optimal value. \[ \min 3X_1 + X_2 + X_3 + X_4 + 2X_5 \qquad\text{subject to} \] \begin{align*} X_3 \geq 1 \\ X_2 \geq 1 \\ X_3 \geq 1 \\ X_3 \geq 1 \\ X_2 + X_3\geq 2 \\ X_2 \geq 1 \\ X_2 \geq 1 \\ X_2 \geq 1 \end{align*} Solution of LP is 3. ++++ **Exercise 2:** Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where * $V=\{v_1, v_2, v_3\}$, $dom(v_1) = \{d,e\}, dom(v_2)=\{f,g\}, dom(v_3)=\{h,j,k\}$, * $s_0 = \{v_1=d,v_2=f,v_3=h\}$, * $g = \{v_1=d,v_3=k\}$, * $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below: ^ action ^ pre ^ add ^ cost ^ | $a_1$ | $v_1=d,v_3=h$ | $v_1=e,v_3=j$ | $2$ | | $a_2$ | $v_1=d$ | $v_2=g$ | $1$ | | $a_3$ | $v_2=g,v_3=j$ | $v_3=k$ | $1$ | | $a_4$ | $v_1=e$ | $v_1=d$ | $2$ | | $a_5$ | $v_3=h$ | $v_3=j$ | $5$ | Given projection $P=\{v_3\}$, compute the heuristic $h_{P}(s_0)$ and post-hoc heuristic. ++++ Solution - heuristic| $P=\{v_3\}$, $s_0 = \{v_3=h\}, g=\{v_3=k\}$, and $A$ is ^ action ^ pre ^ add ^ cost ^ | $a_1$ | $v_3=h$ | $v_3=j$ | $2$ | | $a_3$ | $v_3=j$ | $v_3=k$ | $1$ | The constructed LTS is following $ \{v_3=h\} \stackrel{a_1}{\longrightarrow} \{v_3=j\} \stackrel{a_3}{\longrightarrow} \{v_3=k\}$. And $h_{P}(s_0) = \mathsf{cost}(a_1) + \mathsf{cost}(a_3) = 2 + 1 = 3$. ++++ ++++ Solution - post-hoc heuristic | * For $P \in \{\{v_3\}\}$, compute $h_{P}(s_0)$: $h_{\{v_3\}}(s_0) = 3$ * What actions affect $P$? $A_P = \{a_1,a_3,a_5\}$ * Create linear program, where variables correspond to $A_P$: $X_1, X_3, X_5$. \[ \min 2X_1 + X_3 + 5X_5 \qquad\text{subject to} \] \begin{align*} 2X_1 + X_3 + 5X_5 \geq 3 \end{align*} The optimal value of LP is 3. ++++ ==== Stochastic shortest path (SSP) ==== **Exercise:** Wanting to read a book as soon as possible, we decide to go and get it. We can either take bus or bike to the city center, where there are a library as well as a shop. The problem is modelled as the following LTS. {{ :courses:pui:tutorials:ssp2.png?400 |}} At first, compute the heuristic values of each state $S = \{home, station, bus, library, shop, reading\}$. ++++ Solution - heuristic values | ^ state $s$ ^ $home$ ^ $station$ ^ $bus$ ^ $library$ ^ $shop$ ^ $reading$ ^ | $h(s)$ | 21 | 26 | 16 | 1 | 2 | 0 | ++++ Then perform ILAO$^*$ and get optimal policy $\pi^*$. ++++ Solution - ILAO$^*$ | Recall two steps of ILAO$^*$: - Determine solution graph - Do Value Iteration until policy $\pi$ of any $s$ is changed (then create new solution graph) or the maximal change in Bellman update of $V$ is less than $\epsilon = 0.01$ (then terminate algorithm). * Solution graph: {{ :courses:pui:tutorials:ssp-sg.png?400 |}} * Value iteration of states in solution graph ($home$, $station$, $bus$, $library$, $reading$) according to $Q(s,a) = \textrm{min}_{a \in A} \mathsf{cost}(a) + \sum_{t \in T(s,a)} P(t|s,a) V(t)$. ^ ^ $home$ ^ $station$ ^ $bus$ ^ $library$ ^ $reading$ ^ $\pi(home)$ ^ $\pi(station)$ ^ $\pi(bus)$ ^ $\pi(library)$ ^ $\pi(reading)$ ^ Res($home$) ^ Res($station$) ^ Res($bus$) ^ Res($library$) ^ Res($reading$) ^ | $h(s)$ | 21 | 26 | 16 | 1 | 0 | - | - | - | -| - | - | -| - | - | - | | $V_1(s)$ | 24 | 27 | 16 | 1.4 | 0 | bus | wait | take | borrow | - |**3** | 1 | 0 | 0.4 | 0 | | $V_2(s)$ | 24.3 | 27.1 | 16.4 | 1.56 | 0 | bus | wait | take | borrow | - |0.3 | 0.1 | **0.4** | 0.16 | 0 | | $V_3(s)$ | 24.61 | 27.47 | 16.56 | 1.624 | 0 | bus | wait | take | borrow | - |0.31 | **0.37** | 0.16 | 0.064 | 0 | | ... | | $V_{\infty}(s)$ | 25 | 28 | 17| 1.7 | 0 | bus | wait | take | borrow | - | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | 0 | Optimal policy is $\pi^*(home) = go to bus$, $\pi^*(station) = wait$, $\pi^*(bus) = take bus$, $\pi^*(library) = borrow$. ++++