T4c-patrol - Patrolling in polygonal environment

The task is to implement the computation of turning probability in perimeter patrolling.

Deadline	January 7, 23:59 PST
Points	3
Label in BRUTE	t4c-patrol
Evaluation	upload strategy computation in `strategy.py` to brute
Resources	none

Assignment

Implement following algorithm to compute turning probability $p$.

Create a matrix of size [2d + 1, 2d + 1], initialized with 0s
// Fill out all entries in M as follows:
M[2d, 2d] = 1
for i = 0, i < 2d, i++ do
    direction = (i mod 2==0 ? 1 : -1)
    M[i,(2*direction+i) mod (2d+1)] = p
    M[i,(direction+i) mod (2d+1)] = 1-p
end
// M to the power of t to calculate state probabilities after t steps
// in a sense of a matrix multiplication not only elementwise
MT = M^t  
Res = vector of size d initialized with 0s
// Collect resulting probabilities
for loc = 0, loc < d, loc++ do
    Res[loc] = MT[2loc][2d]
end
return Res

To solve the problem, implement the following two functions, generate_function and optimal_strategy and upload them to BRUTE in file strategy.py. The generate_function function is prescribed as follows:

def generate_function(d, t, p):
    """
    Function to generate the probability of stopping an attack for each segment given some p
 
    Parameters
    ----------
    d:int
        number of spaces between two closest agents
    t:int
        the time required by the attacker to breach the fence
    p:float
        probability of continuing forward
 
    Returns
    -------
    f:float[d]
        list of probabilities of stopping an attack for each segment

example behavior: input: $(3,2,0.5)$, returns: $[0.25,0.25,0.5]$.

example behavior: input: $(4,2,0.25)$, returns: $[0.1875, 0.0, 0.0625, 0.25]$.

The optimal_strategy function has the following prescription

def optimal_strategy(d, t, splits):
    """
    Function to generate the probability of stopping an attack for each segment given some p
 
    Parameters
    ----------
    d:int
        number of spaces between two closest agents
    t:int
        the time required by the attacker to breach the fence
    splits:int
        a number of values of p from 0 to 1 to be created, e.g. for splits = 3, we will have p in [0,0.5,1]
 
    Returns
    -------
    f_all:float[d][splits]
        list of lists, one for each segment, and in the list for the segment are probabilities of stopping an attack for all possible p
    f_min:float[splits]
        list of minimal probabilities of stopping an attack for each p
    strategy:float
        value of the optimal p
    prob:float
        the worst-case probability of stopping an attack given strategy

example behaviors: input: $(3,2,3)$, returns: $([[0.0, 0.25, 0.0], [0.0, 0.25, 1.0], [0.0, 0.5, 1.0]], [0.0, 0.25, 0.0], 0.5, 0.25)$

input: $(3,2,5)$, returns: $([[0.0, 0.1875, 0.25, 0.1875, 0.0], [0.0, 0.0625, 0.25, 0.5625, 1.0], [0.0, 0.25, 0.5, 0.75, 1.0]], [0.0, 0.0625, 0.25, 0.1875, 0.0], 0.5, 0.25)$

Evaluation

The strategy computing functions should be submitted to Brute and if there is some mistake it will show you the input and wrong output on public testing scenarios.

Detailed description of the problem

The environment comprises a closed polygonal environment $P$ which is threatened by an adversary attack. To prevent this attack, the polygon $P$ is being patrolled by $k$ homogeneous mobile robots. The adversary needs to continuously attack a spot on the polygon for the duration $t$ to penetrate the polygon. The aim of this assignment is to compute such a patrolling strategy that minimizes the probability of a successful attack. Such a strategy comprises dividing the polygon into segments, and changing the direction of the robot patrol.

Polygon segmentation

The path around the polygon $P$ may be divided into $N$ segments, where a robot traverses each segment with uniform duration. Therefore, there exists such a time-periodic function that each robot traverses exactly one segment per its period. Moreover, the robot turning action has a non-zero duration of $\tau$. It is recommended to use the turning time $\tau$ as the time unit, i.e., $\tau = 1$.

Figure 1: Polygon environment and the division to segments. Note that on the polygon segments are homogeneously spaced since the segmentation is based on a uniform division of time and some actions, e.g., driving along curves, may take an increased amount of time.

The optimal strategy

The authors of [1] show that in an optimal strategy the robots are always uniformly distributed along the fence, and that at any time all the robots are always moving in a same direction. Let $d = \frac{N}{k}$ be the distance in segments between neighbor robots. Then if $t \geq d$ the strategy is trivial since it is sufficient for the robots to move forward in an arbitrary direction as such a behavior will always intercept any attacker. Conversely, if $t < \frac{d+\tau -1}{2}$, $\tau = 1$, and $t < \frac{d}{2}$ the attacker can always select a point on the polygon where the penetration succeeds. Assuming that neither of the trivial cases holds, the optimal strategy is a stochastic one that may be generalized as follows: for each segment, continue moving forward with the probability $p$, or turn all robots at once with the probability $1-p$.

To compute the optimal $p$, consider the segments between two neighbor robots. Note, due to the symmetric nature of the problem, this applies for all such segments. A Markov chain may be used to model the possible states and transitions between states in such system. For each segment $s_i$ in the original path, consider two states in the graphic model G; one state for being on $s_i$ and moving clockwise ($s_i^{cw}$), and one for moving counterclockwise ($s_i^{cc}$). Since $\tau = 1$, the edges of G are defined as follows. The robot direction change is represented by an $1-p$ probability edge, which leads from $s_i^{cc}$ to $s_i^{cw}$. Similarly, an $1-p$ probability edge leads from $s_i^{cw}$ to $s_i^{cc}$. Finally, edges with probability $p$ lead from $s_i^{cw}$ to $s_{i-1}^{cw}$, and from $s_i^{cc}$ to $s_{i+1}^{cc}$, respectively. Note, if the index of the target state of an edge would be lower than one or bigger than $d$, the edge goes instead to the absorbing state $s_{dt}$.

The algorithm returns a list of values, one for each of the $d$ segments. The value denotes the probability of stopping an attack in the segment given $p$. Therefore, to minimize the probability of a successful attack, optimize over values of $p$ from 0 to 1. Note, thousand or even hundred values should be sufficient. For each value of $p$ compute the worst-case segment, i.e., the segment with the lowers probability of stopping the attack. Then, choose $p$ that maximizes this worst-case.

a) All segment probabilities for all $p$s; b) the worst cases. The selected maximum over the worst cases corresponds to $p = 0.75$; c) matrix M after first iteration.

Reference

[1] Agmon, Noa, Gal A. Kaminka, and Sarit Kraus. “Multi-robot adversarial patrolling: facing a full-knowledge opponent.” Journal of Artificial Intelligence Research 42 (2011): 887-916.