Search
The main task is to implement Monte-Carlo Tree Search (MCTS) policy for robotics pursuit-evasion game.
player
player/Player.py
In file player/Player.py in function monte_carlo_policy implement the MCTS policy decision making for pursuit-evasion game.
monte_carlo_policy
MCTS policy is a heuristics search algorithm for decision making problems. The next-best state is selected in each discrete step of the game based on simulated playouts.
The monte_carlo_policy function has the following prescription which follows the prescription of the greedy_policy from Task11 - Greedy policy in pursuit-evasion:
greedy_policy
def monte_carlo_policy(self, gridmap, evaders, pursuers): """ Method to calculate the monte carlo tree search policy action Parameters ---------- gridmap: GridMap Map of the environment evaders: list((int,int)) list of coordinates of evaders in the game (except the player's robots, if he is evader) pursuers: list((int,int)) list of coordinates of pursuers in the game (except the player's robots, if he is pursuer) """
The purpose of the function is to internally update the self.next_robots variable, which is a list of (int, int) robot coordinates based on the current state of the game, given gridmap grid map of the environment and the player's role self.role. The player is given the list evaders of all evading robots in the game other than his robots and the list of pursuers of all pursuing robots in the game other than his robots. I.e., the complete set of robots in game is given as the union of evaders, pursuers and self.robots.
self.next_robots
(int, int)
gridmap
self.role
evaders
pursuers
self.robots
During the gameplay, each player is asked to update their intention for the next move coded in the self.next_robots variable by calling the calculate_step function. Afterward, the step is performed by calling the take_step̈́ function followed by the game checking each step, whether it complies to the rules of the game.
calculate_step
take_step̈́
The game ends after a predefined number of steps or when all the evaders are captured.
In MCTS, each player has a predefined time for making the decision for a single robot given in the self.timeout variable. The timeout for the decision can be implementted as follows:
self.timeout
#for each of player's robots plan their actions for idx in range(0, len(self.robots)): robot = self.robots[idx] #measure the time for selecting next action clk = time.time() while (time.time() - clk) < self.timeout: #TODO: implement MCTS policy pos_selected = ... #select the next action self.next_robots[idx] = pos_selected
#measure the time for selecting next action clk = time.time() while (time.time() - clk) < self.timeout*len(self.robots): #TODO: implement MCTS policy pass #for each of player's robots plan their actions for idx in range(0, len(self.robots)): #select the next action self.next_robots[idx] = pos_selected
The MCTS policy shall not ignore same configurations in the tree. Possible solution is to hash the states in the tree.
The MCTS policy shall use epsilon greedy policy for rollout.
The code can be evaluated using the following set of game scenarios and the TIMEOUT=5.0. Additional Game Scenarios
TIMEOUT=5.0
The evaluation code extends for:
TIMEOUT = 5.0 games = [("grid", "games/grid_3.game"), ("grid", "games/grid_4.game"), ("grid", "games/grid_5.game"), ("pacman_small", "games/pacman_small_3.game"), ("pacman_small", "games/pacman_small_4.game"), ("pacman", "games/pacman_3.game"), ("pacman", "games/pacman_4.game"), ("pacman", "games/pacman_5.game")]
In grid environment, the MONTE_CARLO pursuers are expected to catch the GREEDY evader.
grid
MONTE_CARLO
GREEDY
Note, you can easily generate new game setups by modifying the .game files accordingly. In the upload system, the student's solutions are tested against the teachers RANDOM and GREEDY policies players.
.game
RANDOM