===== 13./14. Lets prepare for exam - Exercises =====

In this tutorial, we will go over several heuristics of classical planning and probabilistic planning as a preparation for exam. If you have your own exercises or you struggle with any topic, prepare it for the tutorial.


==== Delete relaxation heuristics ====

A relaxed planning task of STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where $a = \langle pre_a, add_a, del_a \rangle$ is a task $Pi^+ = \langle F, A^+, s_0, g \rangle$, where the delete effects of actions are omitted, i.e., $a^+ = \langle pre_{a^+}, add_{a^+}, \emptyset \rangle$.
The optimal heuristic of $\Pi$ is denoted $h^*$ and for $\Pi^+$. However, the heuristic values cannot be computed in polynomial time, therefore, we use estimates: $h^{max}, h^{add}, h^{ff}$. In this tutorial, we consider $\Pi = \Pi^+$ and $h^{max} \leq h^+ = h^* \leq h^{add}$ and $h^+ = h^* \leq h^{ff}$ hold.

**Exercise:** Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where
  * $F=\{a,b,c,d,e,f,g,h\}$,
  * $s_0 = \{a\}$,
  * $g = \{c,d,e,f,g\}$,
  * $A = \{a_1,a_2,a_3,a_4,a_5,a_6\}$ defined below:

^ action ^ pre ^ add ^ cost ^
| $a_1$ | $\{a\}$ | $\{b,c\}$ | $1$ |
| $a_2$ | $\{a,c\}$ | $\{d\}$ | $1$ |
| $a_3$ | $\{b,c\}$ | $\{e\}$ | $1$ |
| $a_4$ | $\{b\}$ | $\{f\}$ | $1$ |
| $a_5$ | $\{d\}$ | $\{e,f\}$ | $1$ |
| $a_6$ | $\{d\}$ | $\{g\}$ | $1$ |

Compute the perfect heuristic $h^*$, $h^{max}$, $h^{add}$, and $h^{ff}$ for the initial state, i.e., $h^*(s_0), h^{max}(s_0),h^{add}(s_0)$ and $h^{ff}(s_0)$.

++++ Solution|

{{ :courses:pui:tutorials:solution.png?400 |}}
  * The optimal $s_0$-plan is $\pi=(a_1, a_2, a_6, a_5)$ generating the following sequence of states: 

$s_0=\{a\} \stackrel{a_1}{\longrightarrow} \{a,b,c\} \stackrel{a_2}{\longrightarrow} \{a,b,c,d\} \stackrel{a_6}{\longrightarrow} \{a,b,c,d,g\} \stackrel{a_5}{\longrightarrow} \{a,b,c,d,e,f,g\} \subseteq g$

Thus $h^*(s_0)=\mathsf{cost}(a_1) + \mathsf{cost}(a_2) + \mathsf{cost}(a_6) + \mathsf{cost}(a_5)= 4$.


  * $h^{max}(s_0) = 3$
  * $h^{add}(s_0) = 11$
  * $h^{ff}(s_0) = 5$


++++

==== LM-Cut Heuristic ====

**Exercise:** Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where
  * $F=\{a,b,c,d,e,f\}$,
  * $s_0 = \{c,d\}$,
  * $g = \{a,e\}$,
  * $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:

^ action ^ pre ^ add ^ cost ^
| $a_1$ | $\{d\}$ | $\{a,e\}$ | $3$ |
| $a_2$ | $\{c,d\}$ | $\{e\}$ | $1$ |
| $a_3$ | $\{d,e\}$ | $\{a,b\}$ | $1$ |
| $a_4$ | $\{b\}$ | $\{d,f\}$ | $1$ |
| $a_5$ | $\{b,e\}$ | $\{f\}$ | $2$ |

Compute $h^{LM-Cut}(s_0)$.

Do not forget to extend $\Pi$ by the by new facts $\bot$ and $\top$ so the new initial state is $\{\bot\}$ and the new goal state is $\{\top\}$.

++++ Solution - Extend $\Pi$|
$F = \{a,b,c,d,e,f,\top,\bot\}, s_0 = \{\bot\}, g = \{\top\}$, and $A = A \cup \{a_{\bot}, a_{\top}\}$, where
^ action ^ pre ^ add ^ cost ^
| $a_{\bot}$ | $\bot$ | $\{c,d\}$ | $0$ |
| $a_{\top}$ | ${\a,e\}$ | $\top$ | $0$ |

++++

++++ Solution - $1^{st}$-iteration|
^  action ^ $a_{\bot}$ ^ $a_{1}$  ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ 
| $\mathsf{cost}$ | 0 | 3 | 1 | 1 | 1 |2| 0 | 

  * $h^{max}$: $h^{max}(\top) = 2$

^  ^ $\bot$ ^ $a$ ^ $b$^ $c$ ^ $d$ ^ $e$ ^ $f$ ^ $\top$ ^
| init | 0 | $\infty$ |  $\infty$ |  $\infty$|  $\infty$ |  $\infty$ |  $\infty$ |  $\infty$ |
| 1 | 0 | $\infty$ |  $\infty$ |  0 |  0| $\infty$ | $\infty$ |  $\infty$ |
| 2 | 0 | 3 |  $\infty$ |  0 |  0| 1| $\infty$ |  $\infty$ |
| 3 | 0 | 2 |  2 |  0 |  0| 1| $\infty$ |  $\infty$ |
| 4 | 0 | 2 |  2 |  0 |  0| 1| $\infty$ |  2 |
| 5 | 0 | 2 |  2 |  0 |  0| 1|3 |  2 |

  * landmarks $L$ from justification graph

^  action ^ $a_{\bot}$ ^ $a_{1}$  ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ 
| **choice** ($\textrm{argmax}_{p \in pre_a} \boldsymbol{\sigma}(p))$  | $\bot$ | $d$ | $c$ | $e$ | $b$ | $b$ | $e$ | 
 
(If tie-break use and lexicographical order.)

From justification graph $\rightarrow L = \{a_1, a_2\}$ and $\textrm{min}\{a_1,a_2\} = 1$.

  * Update action costs

^  action ^ $a_{\bot}$ ^ $a_{1}$  ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ 
| $\mathsf{cost}$ | 2 | 0 | 1 | 1 | 1 |2| 0 | 

++++

/*
++++ Solution - $2^{st}$-iteration|
^  action ^ $a_{\bot}$ ^ $a_{1}$  ^ $a_{2}$ ^ $a_{3}$ ^ $a_{4}$ ^ $a_{5}$ ^ $a_{\top}$ ^ 
| $\mathsf{cost}$ | 0 | 2 | 0 | 0 | 0 |1| 0 | 

  * $h^{max}$: $h^{max}(\top) = 0 \rightarrow $ LM-Cut stops.

^  F ^ $\bot$ ^ $a$ ^ $b$^ $c$ ^ $d$ ^ $e$ ^ $f$ ^ $\top$ ^
| init | 0 | $\infty$ |  $\infty$ |  $\infty$|  $\infty$ |  $\infty$ |  $\infty$ |  $\infty$ |
| 1 | 0 | $\infty$ |  $\infty$ |  0 |  0| $\infty$ | $\infty$ |  $\infty$ |
| 2 | 0 | 2 |  $\infty$ |  0 |  0| 0| $\infty$ |  $\infty$ |
| 3 | 0 | 0 |  0 |  0 |  0| 0| $\infty$ |  $\infty$ |
| 4 | 0 | 0 |  0 |  0 |  0| 0| $\infty$ | 0|
| 5 | 0 | 0 |  0 |  0 |  0| 0| 0|0 |

  
++++
*/
==== Patterns and LP-based heuristics  ====

**Exercise 1:** Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where
  * $V=\{v_a, v_b, v_c, v_d, v_e, v_f\}$, $dom(v_i) = \{0,1\}, \forall i \in \{a,b,c,d,e,f \}$,
  * $s_0 = \{v_a=0,v_b=0,v_c=1,v_d=1,v_e=0,v_f=0\}$,
  * $g = \{v_a=1,v_e=1\}$,
  * $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:

^ action ^ pre ^ add ^ cost ^
| $a_1$ | $\{v_d=1\}$ | $\{v_a=1,v_e=1\}$ | $3$ |
| $a_2$ | $\{v_c=1,v_d=1\}$ | $\{v_e=1\}$ | $1$ |
| $a_3$ | $\{v_d=1,v_e=1\}$ | $\{v_a=1,v_b=1\}$ | $1$ |
| $a_4$ | $\{v_b=1\}$ | $\{v_d=1,v_f=1\}$ | $1$ |
| $a_5$ | $\{v_b=1,v_e=1\}$ | $\{v_f=1\}$ | $2$ |

Create a causal graph of $\Pi$, where vertices corresnponds to $V$ and edges to actions $a_k$, where for each $v_i,v_j \in V$ either $\mathsf{pre}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined or $\mathsf{eff}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined. And from the graph select interesting patterns up to size 2.


++++ Solution - interesting patterns |
Causal graph:

{{ :courses:pui:tutorials:causal_graph.png?200 |}}

Interesting patterns of size 1 are $P_1 = \{ \{v_a\}, \{v_e\} \}$, since $g(v_a)$ and $g(v_e)$ are defined.
Interesting patterns of size 2 are $P_2 = \{ \{v_a,v_b\}, \{v_a, v_d\}, \{v_a,v_e\}, \{v_b,v_e\}, \{v_c,v_e\}, \{v_d,v_e\}  \}$, where each contains variable $v_i$ for which $g(v_i)$ is defined.
Hence, all interesting patters $P_{\rm up 2}$ are $P_{\rm up 2} = P_1 \cup P_2$.

++++ 

Take the interesting patterns $P$ and used them to form a linear program to compute post-hoc heuristic value.
++++ Solution - post-hoc heuristic |
  * Compute pattern database heuristic $h_P(s_0)$ for each interesting pattern $P \in P_{\rm up 2}$ from LTS.
        * LTS: 
             * $P = \{v_a\}$: <code> 0 --(a3)--> 1 </code>
             * $P = \{v_e\}$: <code> 0 --(a2)--> 1 </code>
             * $P = \{v_a,v_b\}$: <code> 
00 -(a1)-> 10 
 |
(a3)
 |
 v
11
</code>
             * $P = \{v_a,v_d\}$: <code>  01 -(a3)-> 11 </code>
             * $P = \{v_a,v_e\}$: <code> 
00 -(a2)-> 01 -(a3)-> 11
 |                  ^
 |                  |
  --------(a1)------
</code>
             * $P = \{v_b,v_e\}$: <code> 00 -(a2)-> 01 </code>
             * $P= \{v_c,v_e \}$: <code> 00 -(a1) -> 01
10 -(a2)-> 11
</code>
             * $P = \{v_d,v_e\}$: <code> 00 -(a4)-> 10 -(a2)-> 11 </code>
        * Heuristic:

^  $P \in P_{\rm up2}$ ^ $\{v_a\}$ ^ $\{v_e\}$ ^ $\{v_a,v_b\}$ ^ $\{v_a,v_d\}$ ^ $\{v_a,v_e\}$ ^ $\{v_b,v_e\}$ ^ $\{v_c,v_e\}$ ^ $\{v_d,v_e\}$ ^
| $h_P(s_0)$ | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 |

  * What actions affect $P_{\rm up2}$? 

^  $P \in P_{\rm up2}$ ^ $\{v_a\}$ ^ $\{v_e\}$ ^ $\{v_a,v_b\}$ ^ $\{v_a,v_d\}$ ^ $\{v_a,v_e\}$ ^ $\{v_b,v_e\}$ ^ $\{v_c,v_e\}$ ^ $\{v_d,v_e\}$ ^
| $a_1$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
| $a_2$ | ✘ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ |
| $a_3$ | ✔ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ |
| $a_4$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
| $a_5$ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |

  *Create LP and compute its optimal value. 

\[ \min 3X_1 + X_2 + X_3 + X_4 + 2X_5 \qquad\text{subject to} \] 
\begin{align*} 
  X_3 \geq 1 \\
  X_2 \geq 1 \\
  X_3 \geq 1 \\
   X_3 \geq 1 \\
  X_2 + X_3\geq 2 \\
  X_2 \geq 1 \\
  X_2 \geq 1 \\
  X_2 \geq 1
\end{align*}

Solution of LP is 3.

++++ 


**Exercise 2:** Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where
  * $V=\{v_1, v_2, v_3\}$, $dom(v_1) = \{d,e\}, dom(v_2)=\{f,g\}, dom(v_3)=\{h,j,k\}$,
  * $s_0 = \{v_1=d,v_2=f,v_3=h\}$,
  * $g = \{v_1=d,v_3=k\}$,
  * $A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:

^ action ^ pre ^ add ^ cost ^
| $a_1$ | $v_1=d,v_3=h$ | $v_1=e,v_3=j$ | $2$ |
| $a_2$ | $v_1=d$ | $v_2=g$ | $1$ |
| $a_3$ | $v_2=g,v_3=j$ | $v_3=k$ | $1$ |
| $a_4$ | $v_1=e$ | $v_1=d$ | $2$ |
| $a_5$ | $v_3=h$ | $v_3=j$ | $5$ |

Given projection $P=\{v_3\}$, compute the heuristic $h_{P}(s_0)$ and post-hoc heuristic.

++++ Solution - heuristic|
$P=\{v_3\}$, $s_0 = \{v_3=h\}, g=\{v_3=k\}$, and $A$ is

^ action ^ pre ^ add ^ cost ^
| $a_1$ | $v_3=h$ | $v_3=j$ | $2$ |
| $a_3$ | $v_3=j$ | $v_3=k$ | $1$ |


The constructed LTS is following $ \{v_3=h\} \stackrel{a_1}{\longrightarrow} \{v_3=j\} \stackrel{a_3}{\longrightarrow} \{v_3=k\}$. And $h_{P}(s_0) = \mathsf{cost}(a_1) + \mathsf{cost}(a_3) = 2 + 1 = 3$.


++++

++++ Solution - post-hoc heuristic |
  * For $P \in \{\{v_3\}\}$, compute $h_{P}(s_0)$: $h_{\{v_3\}}(s_0) = 3$

  * What actions affect $P$? $A_P = \{a_1,a_3,a_5\}$

  * Create linear program, where variables correspond to $A_P$: $X_1, X_3, X_5$.

\[ \min 2X_1 + X_3 + 5X_5 \qquad\text{subject to} \] 
\begin{align*} 
  2X_1 + X_3 + 5X_5 \geq 3
\end{align*}

The optimal value of LP is 3. 

++++

==== Stochastic shortest path (SSP)  ====

**Exercise:** Wanting to read a book as soon as possible, we decide to go and get it. We can either take bus or bike to the city center, where there are a library as well as a shop. The problem is modelled as the following LTS.

{{ :courses:pui:tutorials:ssp2.png?400 |}}


At first, compute the heuristic values of each state $S = \{home, station, bus, library, shop, reading\}$.

++++ Solution - heuristic values |

^ state $s$  ^ $home$  ^ $station$  ^ $bus$ ^ $library$ ^ $shop$  ^ $reading$ ^
| $h(s)$ | 21 | 26 | 16 | 1 | 2 | 0 |

++++

Then perform ILAO$^*$ and get optimal policy $\pi^*$.

++++ Solution - ILAO$^*$ |

Recall two steps of ILAO$^*$:
  - Determine solution graph
  - Do Value Iteration until policy $\pi$ of any $s$ is changed (then create new solution graph) or the maximal change in Bellman update of $V$ is less than $\epsilon = 0.01$ (then terminate algorithm).


  * Solution graph:
{{ :courses:pui:tutorials:ssp-sg.png?400 |}}

  * Value iteration of states in solution graph ($home$, $station$, $bus$, $library$, $reading$) according to $Q(s,a) = \textrm{min}_{a \in A} \mathsf{cost}(a) + \sum_{t \in T(s,a)} P(t|s,a) V(t)$.

^   ^ $home$  ^ $station$  ^ $bus$ ^ $library$ ^ $reading$ ^ $\pi(home)$  ^ $\pi(station)$  ^ $\pi(bus)$ ^ $\pi(library)$ ^ $\pi(reading)$ ^ Res($home$)  ^ Res($station$)  ^ Res($bus$) ^ Res($library$) ^ Res($reading$) ^
| $h(s)$ | 21 | 26 | 16 | 1 | 0 | - | - | - | -| - | -  | -| - | - | - |  
| $V_1(s)$ | 24 | 27 | 16 | 1.4 | 0 | bus | wait | take | borrow | - |**3** | 1 | 0 | 0.4 | 0 |
| $V_2(s)$ | 24.3 | 27.1 | 16.4 | 1.56 | 0 | bus | wait | take | borrow | - |0.3 | 0.1 | **0.4** | 0.16 | 0 |
| $V_3(s)$ | 24.61 | 27.47 | 16.56 | 1.624 | 0 | bus | wait | take | borrow | - |0.31 | **0.37** | 0.16 | 0.064 | 0 |
| ... |
| $V_{\infty}(s)$ | 25 | 28 | 17| 1.7 | 0 | bus | wait | take | borrow | - | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | $\approx 0.01$ | 0 |


Optimal policy is $\pi^*(home) = go to bus$, $\pi^*(station) = wait$, $\pi^*(bus) = take bus$, $\pi^*(library) = borrow$.


++++