===== 5. Fast-forward heuristic and action landmarks ==== In this tutorial, we will compute the fast-forward heuristic based on the best supporters of facts. Let $\Pi=\langle F,A,s_0,g\rangle$ be a STRIPS task and $s\subseteq F$ a state. When computing $h^{max}(s)$, we must compute the fixed point $\boldsymbol{s}^*$ of $\Gamma_{max}$. Recall that the best supporter $bs(p)$ of a fact $p\not\in s$ is the last action $a\in A$ responsible for the value $\boldsymbol{s}^*(p)<\infty$. If $p\in s$ or $\boldsymbol{s}^*(p)=\infty$, the best supporter is undefined which we denote as $bs(p)=\bot$. **Exercise 1:** Consider the following STRIPS task $\Pi=\langle F,A,s_0,g\rangle$ where * $F=\{a,b,c,d,e\}$, * $s_0 = \{a\}$, * $g = \{b,e\}$, * $A = \{o_1,o_2,o_3,o_4\}$ defined in the table below: ^ action ^ pre ^ add ^ cost ^ | $o_1$ | $\emptyset$ | $\{b,c\}$ | $4$ | | $o_2$ | $\{a\}$ | $\{c\}$ | $2$ | | $o_3$ | $\{a,c\}$ | $\{d\}$ | $3$ | | $o_4$ | $\{c,d\}$ | $\{e\}$ | $1$ | Compute the perfect heuristic $h^{ff}$ for the initial state, i.e., $h^{ff}(s_0)$. ++++ Solution| We will iteratively apply the operator $\Gamma_{max}$ to compute the fixed point. During the computation, we will maintain the last actions responsible for the updated values of the current enriched state. Once we get the fixed point, we get the final best supporters. ^ iteration ^ $a$ ^ $b$ ^ $c$ ^ $d$ ^ $e$ ^ $bs(b)$ ^ $bs(c)$ ^ $bs(d)$ ^ $bs(e)$ ^ | $0$ | $0$ | $4$ | $4$ | $\infty$ | $\infty$ | $o_1$ | $o_1$ | $\bot$ | $\bot$ | | $1$ | $0$ | $4$ | $2$ | $7$ | $\infty$ | $o_1$ | $o_2$ | $o_3$ | $\bot$ | | $2$ | $0$ | $4$ | $2$ | $5$ | $8$ | $o_1$ | $o_2$ | $o_3$ | $o_4$ | | $3$ | $0$ | $4$ | $2$ | $5$ | $6$ | $o_1$ | $o_2$ | $o_3$ | $o_4$ | * **0th row:** We get the initial best supporters from the actions with empty preconditions. That's why $bs(b)$ and $bs(c)$ are $o_1$ as $b,c$ can be produced by $o_1$. The remaining best supporters are undefined. * **1st row:** Note that we updated the values for $c$ and $d$. The responsible actions for these updates are $o_2$ and $o_3$. * **2nd row:** Note that we updated the values for $d$ and $e$. The responsible actions for these updates are $o_3$ and $o_4$. * **3rd row:** Note that we updated the value for $e$. The responsible action for this update is $o_4$. We will apply the best supporter backward from the goal state until we get a subset of the initial state. We always choose the best supporter of the most expensive fact. In the following, the superscripts of facts are their fixed-point values. \[\{b^4,e^6\}\stackrel{o_4}{\longleftarrow} \{b^4,c^2,d^5\}\stackrel{o_3}{\longleftarrow} \{a^0,b^4,c^2\}\stackrel{o_1}{\longleftarrow} \{a\}\subseteq s_0\] Note that the sequence $o_1,o_3,o_4$ is an $s_0$-plan. Its cost is $4+3+1=8$. Thus $h^{ff}(s_0)=8$. ++++ **Exercise 2:** Consider the following STRIPS task $\Pi=\langle F,A,s_0,g\rangle$ where * $F=\{a,b,c,d,e\}$, * $s_0 = \{a\}$, * $g = \{d,e\}$, * $A = \{o_1,o_2,o_3,o_4,o_5\}$ defined in the table below: ^ action ^ pre ^ add ^ cost ^ | $o_1$ | $\emptyset$ | $\{b\}$ | $2$ | | $o_2$ | $\{a\}$ | $\{b,c\}$ | $4$ | | $o_3$ | $\{a,b,c\}$ | $\{d,e\}$ | $3$ | | $o_4$ | $\{b,c\}$ | $\{e\}$ | $2$ | | $o_5$ | $\{e\}$ | $\{b,c\}$ | $2$ | Compute $h^{ff}(s_0)$. ++++ Solution| For $h^{max}$ and the best supporters, we have the following computation: ^ iteration ^ $a$ ^ $b$ ^ $c$ ^ $d$ ^ $e$ ^ $bs(b)$ ^ $bs(c)$ ^ $bs(d)$ ^ $bs(e)$ ^ | $0$ | $0$ | $2$ | $\infty$ | $\infty$ | $\infty$ | $o_1$ | $\bot$ | $\bot$ | $\bot$ | | $1$ | $0$ | $2$ | $4$ | $\infty$ | $\infty$ | $o_1$ | $o_2$ | $\bot$ | $\bot$ | | $2$ | $0$ | $2$ | $4$ | $7$ | $6$ | $o_1$ | $o_2$ | $o_3$ | $o_4$ | \[\{d^7,e^6\}\stackrel{o_3}{\longleftarrow} \{a^0,b^2,c^4\}\stackrel{o_2}{\longleftarrow} \{a\}\subseteq s_0\] Thus $h^{ff}(s_0) = 4+3=7$. ++++ **Exercise 3:** Consider the STRIPS task $\Pi=\langle F,A,s_0,g\rangle$ where * $F = \{a,b,c,d,e\}$, * $s_0 = \{a\}$, * $g=\{c,d,e\}$, * $A=\{o_1,o_2,o_3,o_4\}$ ^ action ^ pre ^ add ^ cost ^ | $o_1$ | $\{a\}$ | $\{b,c\}$ | $1$ | | $o_2$ | $\{a\}$ | $\{d\}$ | $2$ | | $o_3$ | $\{b\}$ | $\{c,d\}$ | $3$ | | $o_4$ | $\{a,b\}$ | $\{e\}$ | $4$ | Find several action landmarks for $s_0$ induced by an $s$-$G$-cut and compute its elementary landmark heuristic at $s_0$. ++++ Solution| We need to split the state space $S=2^F$ of the LTS induced by $\Theta$ into two disjoint sets $U$ and $S\setminus U$ such that $s_0\in U$ and $G\cap U=\emptyset$. There are many choices for doing that. However, for some choices, it is easier to find the corresponding cut-set, i.e., its action landmark. One such possibility is to select a fact $p\in g\setminus s_0$ and split the states based on its membership. For instance, consider $c\in g$ and define \[U_c=\{s\subseteq F\mid c\not\in s\}.\] Clearly, $s_0\in U$ and $G\cap U=\emptyset$. It is also easy to find an action landmark $L_c$ containing the cut-set of $U_c$, as the transitions leading from $U$ outside of $U$ are the transitions labeled by an action $a$ such that $c\in\mathsf{add}_a$. Thus \[L_c=\{a\in A\mid c\in\mathsf{add}_a\}=\{o_1,o_3\}.\] Its elementary landmark heuristic $h_{L_c}$ evaluated in $s_0$ gives us $h_{L_c}(s_0)=\min\{1,3\}=1$. Analogously, starting with $d\in g$, we obtain \[L_d=\{a\in A\mid d\in\mathsf{add}_a\}=\{o_2,o_3\},\text{ and } h_{L_d}(s_0)=\min\{2,3\}=2.\] Finally, starting with $e\in g$. we obtain \[L_e=\{a\in A\mid e\in\mathsf{add}_a\}=\{o_4\},\text{ and } h_{L_e}(s_0)=\min\{4\}=4.\] ++++ **Exercise 4:** For the action landmarks from the previous exercise, compute the saturated cost partitioning of the cost function and evaluate the cost partitioned heuristic at $s_0$. ++++ Solution| Given an action landmark $L$ for a state $s$ and a cost function $\mathsf{cost}$, the saturated cost function for $h_L$ is the function \[\mathsf{scf}(a)=\begin{cases} m & \text{if } a\in L,\\ 0 & \text{otherwise.} \end{cases}\] where $m = \min\{\mathsf{cost}(a)\mid a\in L\}$. Thus we can partition $\mathsf{cost} = \mathsf{scf} + \mathsf{rem}$ where \[\mathsf{rem}(a)=\begin{cases} \mathsf{cost}(a)-m & \text{if } a\in L,\\ \mathsf{cost}(a) & \text{otherwise.} \end{cases}\] We can apply the above partitioning iteratively for all the action landmarks in a chosen order, e.g., $L_c,L_d,L_e$. We partition $\mathsf{cost}$ for $L_c$ as $\mathsf{cost} = \mathsf{scf}_c + \mathsf{rem}_c$ where ^ $\mathsf{scf}_c(o_1)$ ^$\mathsf{scf}_c(o_2)$ ^ $\mathsf{scf}_c(o_3)$ ^ $\mathsf{scf}_c(o_4)$ ^ $\mathsf{rem}_c(o_1)$ ^$\mathsf{rem}_c(o_2)$ ^ $\mathsf{rem}_c(o_3)$ ^ $\mathsf{rem}_c(o_4)$ ^ | $1$ | $0$ | $1$ | $0$ | $0$ | $2$ | $2$ | $4$ | Next we further partition $\mathsf{rem}_c$ using $L_d$ obtaining $\mathsf{rem}_c = \mathsf{scf}_d + \mathsf{rem}_d$ where ^ $\mathsf{scf}_d(o_1)$ ^$\mathsf{scf}_d(o_2)$ ^ $\mathsf{scf}_d(o_3)$ ^ $\mathsf{scf}_d(o_4)$ ^ $\mathsf{rem}_d(o_1)$ ^$\mathsf{rem}_d(o_2)$ ^ $\mathsf{rem}_d(o_3)$ ^ $\mathsf{rem}_d(o_4)$ ^ | $0$ | $2$ | $2$ | $0$ | $0$ | $0$ | $0$ | $4$ | /* Finally, we partition $\mathsf{rem}_d$ using $L_e$ obtaining $\mathsf{rem}_d = \mathsf{scf}_e + \mathsf{rem}_e$ where ^ $\mathsf{scf}_e(o_1)$ ^$\mathsf{scf}_e(o_2)$ ^ $\mathsf{scf}_e(o_3)$ ^ $\mathsf{scf}_e(o_4)$ ^ $\mathsf{rem}_e(o_1)$ ^$\mathsf{rem}_e(o_2)$ ^ $\mathsf{rem}_e(o_3)$ ^ $\mathsf{rem}_e(o_4)$ ^ | $0$ | $0$ | $1$ | $1$ | $0$ | $0$ | $0$ | $2$ | */ Altogether we have $\mathsf{cost} = \mathsf{scf}_c +\mathsf{scf}_d + \mathsf{rem}_d$ and the cost partitioned heuristic is $h(\mathsf{cost}, s_0)=h_{L_c}(\mathsf{scf}_c, s_0) + h_{L_d}(\mathsf{scf}_d, s_0) + h_{L_e}(\mathsf{rem}_d, s_0)=1 + 2 + 4=7$. ++++