courses:pui:tutorials:tutorial13 [CourseWare Wiki]

13./14. Lets prepare for exam - Exercises

In this tutorial, we will go over several heuristics of classical planning and probabilistic planning as a preparation for exam. If you have your own exercises or you struggle with any topic, prepare it for the tutorial.

Delete relaxation heuristics

A relaxed planning task of STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where $a = \langle pre_a, add_a, del_a \rangle$ is a task $Pi^+ = \langle F, A^+, s_0, g \rangle$, where the delete effects of actions are omitted, i.e., $a^+ = \langle pre_{a^+}, add_{a^+}, \emptyset \rangle$. The optimal heuristic of $\Pi$ is denoted $h^*$ and for $\Pi^+$. However, the heuristic values cannot be computed in polynomial time, therefore, we use estimates: $h^{max}, h^{add}, h^{ff}$. In this tutorial, we consider $\Pi = \Pi^+$ and $h^{max} \leq h^+ = h^* \leq h^{add}$ and $h^+ = h^* \leq h^{ff}$ hold.

Exercise: Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where

$F=\{a,b,c,d,e,f,g,h\}$,
$s_0 = \{a\}$,
$g = \{c,d,e,f,g\}$,
$A = \{a_1,a_2,a_3,a_4,a_5,a_6\}$ defined below:

action	pre	add	cost
$a_1$	$\{a\}$	$\{b,c\}$	$1$
$a_2$	$\{a,c\}$	$\{d\}$	$1$
$a_3$	$\{b,c\}$	$\{e\}$	$1$
$a_4$	$\{b\}$	$\{f\}$	$1$
$a_5$	$\{d\}$	$\{e,f\}$	$1$
$a_6$	$\{d\}$	$\{g\}$	$1$

Compute the perfect heuristic $h^*$, $h^{max}$, $h^{add}$, and $h^{ff}$ for the initial state, i.e., $h^*(s_0), h^{max}(s_0),h^{add}(s_0)$ and $h^{ff}(s_0)$.

Solution

LM-Cut Heuristic

Exercise: Consider the following STRIPS task $\Pi = \langle F, A, s_0, g \rangle$, where

$F=\{a,b,c,d,e,f\}$,
$s_0 = \{c,d\}$,
$g = \{a,e\}$,
$A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:

action	pre	add	cost
$a_1$	$\{d\}$	$\{a,e\}$	$3$
$a_2$	$\{c,d\}$	$\{e\}$	$1$
$a_3$	$\{d,e\}$	$\{a,b\}$	$1$
$a_4$	$\{b\}$	$\{d,f\}$	$1$
$a_5$	$\{b,e\}$	$\{f\}$	$2$

Compute $h^{LM-Cut}(s_0)$.

Do not forget to extend $\Pi$ by the by new facts $\bot$ and $\top$ so the new initial state is $\{\bot\}$ and the new goal state is $\{\top\}$.

Solution - Extend $\Pi$

$F = \{a,b,c,d,e,f,\top,\bot\}, s_0 = \{\bot\}, g = \{\top\}$, and $A = A \cup \{a_{\bot}, a_{\top}\}$, where

action	pre	add	cost
$a_{\bot}$	$\bot$	$\{c,d\}$	$0$
$a_{\top}$	${\a,e\}$	$\top$	$0$

Solution - $1^{st}$-iteration

action	$a_{\bot}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{\top}$
$\mathsf{cost}$	0	3	1	1	1	2	0

$h^{max}$: $h^{max}(\top) = 2$

	$a$	$b$	$c$	$d$	$e$	$f$	$\top$
init	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$
1	$\infty$	$\infty$	0	0	$\infty$	$\infty$	$\infty$
2	3	$\infty$	0	0	1	$\infty$	$\infty$
3	2	2	0	0	1	$\infty$	$\infty$
4	2	2	0	0	1	$\infty$	2
5	2	2	0	0	1	3	2

landmarks $L$ from justification graph

action	$a_{\bot}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{\top}$
choice ($\textrm{argmax}_{p \in pre_a} \boldsymbol{\sigma}(p))$	$\bot$	$d$	$c$	$e$	$b$	$b$	$e$

(If tie-break use and lexicographical order.)

From justification graph $\rightarrow L = \{a_1, a_2\}$ and $\textrm{min}\{a_1,a_2\} = 1$.

Update action costs

action	$a_{\bot}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{\top}$
$\mathsf{cost}$	2	0	1	1	1	2	0

Patterns and LP-based heuristics

Exercise 1: Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where

$V=\{v_a, v_b, v_c, v_d, v_e, v_f\}$, $dom(v_i) = \{0,1\}, \forall i \in \{a,b,c,d,e,f \}$,
$s_0 = \{v_a=0,v_b=0,v_c=1,v_d=1,v_e=0,v_f=0\}$,
$g = \{v_a=1,v_e=1\}$,
$A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:

action	pre	add	cost
$a_1$	$\{v_d=1\}$	$\{v_a=1,v_e=1\}$	$3$
$a_2$	$\{v_c=1,v_d=1\}$	$\{v_e=1\}$	$1$
$a_3$	$\{v_d=1,v_e=1\}$	$\{v_a=1,v_b=1\}$	$1$
$a_4$	$\{v_b=1\}$	$\{v_d=1,v_f=1\}$	$1$
$a_5$	$\{v_b=1,v_e=1\}$	$\{v_f=1\}$	$2$

Create a causal graph of $\Pi$, where vertices corresnponds to $V$ and edges to actions $a_k$, where for each $v_i,v_j \in V$ either $\mathsf{pre}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined or $\mathsf{eff}_{a_k}(v_i)$ and $\mathsf{eff}_{a_k}(v_j)$ are defined. And from the graph select interesting patterns up to size 2.

Solution - interesting patterns

Take the interesting patterns $P$ and used them to form a linear program to compute post-hoc heuristic value.

Solution - post-hoc heuristic

Compute pattern database heuristic $h_P(s_0)$ for each interesting pattern $P \in P_{\rm up 2}$ from LTS.

LTS:

$P = \{v_a\}$:
```
 0 --(a3)--> 1 
```
$P = \{v_e\}$:
```
 0 --(a2)--> 1 
```
$P = \{v_a,v_b\}$:
```
 
00 -(a1)-> 10 
 |
(a3)
 |
 v
11
```
$P = \{v_a,v_d\}$:
```
  01 -(a3)-> 11 
```

$P = \{v_a,v_e\}$:

 
00 -(a2)-> 01 -(a3)-> 11
 |                  ^
 |                  |
  --------(a1)------

$P = \{v_b,v_e\}$:
```
 00 -(a2)-> 01 
```
$P= \{v_c,v_e \}$:
```
 00 -(a1) -> 01
10 -(a2)-> 11
```
$P = \{v_d,v_e\}$:
```
 00 -(a4)-> 10 -(a2)-> 11 
```

Heuristic:

$P \in P_{\rm up2}$	$\{v_a\}$	$\{v_e\}$	$\{v_a,v_b\}$	$\{v_a,v_d\}$	$\{v_a,v_e\}$	$\{v_b,v_e\}$	$\{v_c,v_e\}$	$\{v_d,v_e\}$
$h_P(s_0)$	1	1	1	1	2	1	1	1

What actions affect $P_{\rm up2}$?

$P \in P_{\rm up2}$	$\{v_a\}$	$\{v_e\}$	$\{v_a,v_b\}$	$\{v_a,v_d\}$	$\{v_a,v_e\}$	$\{v_b,v_e\}$	$\{v_c,v_e\}$	$\{v_d,v_e\}$
$a_1$	✘	✘	✘	✘	✘	✘	✘	✘
$a_2$	✘	✔	✘	✘	✔	✔	✔	✔
$a_3$	✔	✘	✔	✔	✔	✘	✘	✘
$a_4$	✘	✘	✘	✘	✘	✘	✘	✘
$a_5$	✘	✘	✘	✘	✘	✘	✘	✘

Create LP and compute its optimal value.

\[ \min 3X_1 + X_2 + X_3 + X_4 + 2X_5 \qquad\text{subject to} \] \begin{align*} X_3 \geq 1 \\ X_2 \geq 1 \\ X_3 \geq 1 \\ X_3 \geq 1 \\ X_2 + X_3\geq 2 \\ X_2 \geq 1 \\ X_2 \geq 1 \\ X_2 \geq 1 \end{align*}

Solution of LP is 3.

Exercise 2: Consider the following FDR task $\Pi = \langle V, A, s_0, g \rangle$, where

$V=\{v_1, v_2, v_3\}$, $dom(v_1) = \{d,e\}, dom(v_2)=\{f,g\}, dom(v_3)=\{h,j,k\}$,
$s_0 = \{v_1=d,v_2=f,v_3=h\}$,
$g = \{v_1=d,v_3=k\}$,
$A = \{a_1,a_2,a_3,a_4,a_5\}$ defined below:

action	pre	add	cost
$a_1$	$v_1=d,v_3=h$	$v_1=e,v_3=j$	$2$
$a_2$	$v_1=d$	$v_2=g$	$1$
$a_3$	$v_2=g,v_3=j$	$v_3=k$	$1$
$a_4$	$v_1=e$	$v_1=d$	$2$
$a_5$	$v_3=h$	$v_3=j$	$5$

Given projection $P=\{v_3\}$, compute the heuristic $h_{P}(s_0)$ and post-hoc heuristic.

Solution - heuristic

$P=\{v_3\}$, $s_0 = \{v_3=h\}, g=\{v_3=k\}$, and $A$ is

action	pre	add	cost
$a_1$	$v_3=h$	$v_3=j$	$2$
$a_3$	$v_3=j$	$v_3=k$	$1$

The constructed LTS is following $ \{v_3=h\} \stackrel{a_1}{\longrightarrow} \{v_3=j\} \stackrel{a_3}{\longrightarrow} \{v_3=k\}$. And $h_{P}(s_0) = \mathsf{cost}(a_1) + \mathsf{cost}(a_3) = 2 + 1 = 3$.

Solution - post-hoc heuristic

Stochastic shortest path (SSP)

Exercise: Wanting to read a book as soon as possible, we decide to go and get it. We can either take bus or bike to the city center, where there are a library as well as a shop. The problem is modelled as the following LTS.

At first, compute the heuristic values of each state $S = \{home, station, bus, library, shop, reading\}$.

Solution - heuristic values

Then perform ILAO$^*$ and get optimal policy $\pi^*$.

Solution - ILAO$^*$

Recall two steps of ILAO$^*$:

Determine solution graph
Do Value Iteration until policy $\pi$ of any $s$ is changed (then create new solution graph) or the maximal change in Bellman update of $V$ is less than $\epsilon = 0.01$ (then terminate algorithm).

Solution graph:

Value iteration of states in solution graph ($home$, $station$, $bus$, $library$, $reading$) according to $Q(s,a) = \textrm{min}_{a \in A} \mathsf{cost}(a) + \sum_{t \in T(s,a)} P(t|s,a) V(t)$.

	$home$	$station$	$bus$	$library$	$reading$	$\pi(home)$	$\pi(station)$	$\pi(bus)$	$\pi(library)$	$\pi(reading)$	Res($home$)	Res($station$)	Res($bus$)	Res($library$)	Res($reading$)
$h(s)$	21	26	16	1	0	-	-	-	-	-	-	-	-	-	-
$V_1(s)$	24	27	16	1.4	0	bus	wait	take	borrow	-	3	1	0	0.4	0
$V_2(s)$	24.3	27.1	16.4	1.56	0	bus	wait	take	borrow	-	0.3	0.1	0.4	0.16	0
$V_3(s)$	24.61	27.47	16.56	1.624	0	bus	wait	take	borrow	-	0.31	0.37	0.16	0.064	0
…
$V_{\infty}(s)$	25	28	17	1.7	0	bus	wait	take	borrow	-	$\approx 0.01$	$\approx 0.01$	$\approx 0.01$	$\approx 0.01$	0

Optimal policy is $\pi^*(home) = go to bus$, $\pi^*(station) = wait$, $\pi^*(bus) = take bus$, $\pi^*(library) = borrow$.

Table of Contents

13./14. Lets prepare for exam - Exercises

Delete relaxation heuristics

LM-Cut Heuristic

Patterns and LP-based heuristics

Stochastic shortest path (SSP)