courses:pui:tutorials:tutorial02 [CourseWare Wiki]

2. Heuristic Search

Exercise 1: Find an LTS $\Theta=\langle S,A,T,G\rangle$ and a heuristic $h\colon S\to\mathbb{R}_0^+\cup\{\infty\}$ which is admissible but not consistent. Make the LTS $\Theta$ so that GBFS fails to find an optimal $s_0$-plan for an initial state $s_0$.

Solution

The simplest example of an LTS and an admissible and non-consistent heuristic is simply an LTS built on a chain $a\to b\to c$ where $c$ is the only goal state. Suppose that the transitions cost $1$. Then $h(a)=2$, $h(b)=0$, $h(c)=0$ is an admissible heuristic that is not consistent because $h(a)-h(b)=2-0=2>1$. However, this simple LTS and the heuristic cannot puzzle GBFS, as only a single path goes from $a$ to $c$.

Consider the following LTS $\Theta=\langle S,A,T,G\rangle$ where $S=\{a,b,c,d\}$, $A=\{\alpha,\beta,\gamma,\delta\}$, $G=\{d\}$, and the transitions $T$ are shown in the digraph below:

The costs of the actions are defined:

action	$\alpha$	$\beta$	$\gamma$	$\delta$
cost	1	1	3	2

The heuristic $h$ is defined:

state	$a$	$b$	$c$	$d$
$h$	3	3	0	0

It is easy to check that $h$ is admissible. To see that it is not consistent, consider the transition $\langle b,\beta,c\rangle\in T$. We have $h(b)-h(c)=3-0=3> \mathsf{cost}(\beta)$.

Exercise 2: Let $\Theta$ be your above LTS and $h$ the admissible non-consistent heuristic. Choose an initial state and consider the algorithms GBFS and A$^*$. Show how the data structures that keep the state of their computation change in every iteration.

Solution

As an initial state, we choose $a\in S$. GBFS employs two data structures to keep its state of the computation: the priority queue $Open$ for states to be expanded, a set $Closed$ for already expanded states, and $Parent$ storing for each expanded state its predecessor and action. The elements of the priority queue are pairs of the form $(p,s)$ where $p$ is the priority and $s$ is a state.

Iteration	$Open$	$Closed$	$Parent$
0	$\{(3,a)\}$	$\emptyset$	$\emptyset$
1	$\{(0,c),(3,b)\}$	$\{a\}$	$Parent(b)=(a,\alpha)$, $Parent(c)=(a,\gamma)$
2	$\{(0,d),(3,b)\}$	$\{a,c\}$	$Parent(b)=(a,\alpha)$, $Parent(c)=(a,\gamma)$, $Parent(d)=(c,\delta)$

In the final iteration, the goal state $d$ is removed from $Open$, and the algorithm terminates. Thus the plan returned by GBFS is $\pi=\gamma,\delta$ and $\mathsf{cost}(\pi)=3+2=5$, i.e., non-optimal plan.

A$^*$ replaces $Closed$ with gScores.

Iteration	$Open$	$gScores$	$Parent$
0	$\{(3,a)\}$	$g(a)=0$, $g(b)=\infty$, $g(c)=\infty$, $g(d)=\infty$	$\emptyset$
1	$\{(3,c),(4,b)\}$	$g(a)=0$, $g(b)=1$, $g(c)=3$, $g(d)=\infty$	$Parent(b)=(a,\alpha)$, $Parent(c)=(a,\gamma)$
2	$\{(4,b),(5,d)\}$	$g(a)=0$, $g(b)=1$, $g(c)=3$, $g(d)=5$	$Parent(b)=(a,\alpha)$, $Parent(c)=(a,\gamma)$, $Parent(d)=(c,\delta)$
3	$\{(2,c),(5,d)\}$	$g(a)=0$, $g(b)=1$, $g(c)=2$, $g(d)=5$	$Parent(b)=(a,\alpha)$, $Parent(c)=(b,\beta)$, $Parent(d)=(c,\delta)$
4	$\{(4,d)\}$	$g(a)=0$, $g(b)=1$, $g(c)=2$, $g(d)=4$	$Parent(b)=(a,\alpha)$, $Parent(c)=(b,\beta)$, $Parent(d)=(c,\delta)$

Note that the 2nd iteration expands the state $b$ and must reopen $c$ again as the path through $b$ is cheaper than the direct path from $a$. In the 3rd iteration, the algorithm expands $c$, finds a cheaper path to $d$, and updates the priority of $d$ in $Open$.

In the final iteration, the goal state $d$ is removed from $Open$, and the algorithm terminates. Thus the plan returned by A$^*$ is $\pi^*=\alpha,\beta,\delta$ and $\mathsf{cost}(\pi)=1+1+2=4$, i.e., the optimal plan.

Exercise 3: Consider A* with zero heuristic and unit traversal costs, and breadth-first-search. What, if any, is the difference between the two?

Solution

Exercise 4: We have two admissible heuristics, $h_a$ and $h_b$, such that $\forall x \in S: h_a(x) \geq h_b(x)$, in other words, $h_a$ returns same or more accurate estimate of the actual costs. Does this mean that $h_b$ is worse in terms of how long it takes A* to find a solution using that heuristic? Can you proove that? If not, can you provide a counter-example?

Exercise 5: Consider two admissible and consisten heuristics $h_a$ and $h_b$.

is the heuristic $h_m = max(h_a, h_b)$ admissible/consistent?
is the heuristic $h_m = \alpha(h_a) + (1 - \alpha)(h_b)$ admissible/consistent for $0 \leq \alpha \leq 1$?

Solution

Sliding puzzle heuristics

Consider the sliding puzzle as a planning problem.

question 1: How many potential states are there? Are all of them reachable?

question 2: Try to come up with some heuristics for this specific planning task.

question 3: Consider the following heuristics:

(hidden because of question 2)

Are they admissible? Are they consistent (think about some arguments/potential counter-examples to support your opinions)?

qeustion 4: Try to come up with a different heuristic that is both inconsistent and reasonable to use.

question 5: So far we assumed that only one tile can move at a time. Would all the heuristics discussed so far change their properties if we were allowed to move an entire line/column at once? (That is, (up to) two tiles for a 3×3 puzzle, up to three tiles for 4×4 puzzle…)