The main task is to implement Monte-Carlo Tree Search (MCTS) policy for robotics pursuit-evasion game.
Deadline | 28. December 2019, 23:59 PST |
Points | 6 |
Label in BRUTE | t4b-mcts |
Files to submit | Player.py |
Resources | T4b-mcts resource files |
In file player/Player.py
in function monte_carlo_policy
implement the MCTS policy decision making for pursuit-evasion game.
MCTS policy is a heuristics search algorithm for decision-making problems. The next-best state is selected in each discrete step of the game based on simulated playouts.
The monte_carlo_policy
function has the following prescription which follows the prescription:
def monte_carlo_policy(self, gridmap, evaders, pursuers): """ Method to calculate the monte carlo tree search policy action Parameters ---------- gridmap: GridMap Map of the environment evaders: list((int,int)) list of coordinates of evaders in the game (except the player's robots, if he is evader) pursuers: list((int,int)) list of coordinates of pursuers in the game (except the player's robots, if he is pursuer) """
The purpose of the function is to internally update the self.next_robots
variable, which is a list of (int, int)
robot coordinates based on the current state of the game, given gridmap
grid map of the environment and the player's role self.role
. The player is given the list evaders
of all evading robots in the game other than his robots and the list of pursuers
of all pursuing robots in the game other than his robots. I.e., the complete set of robots in the game is given as the union of evaders
, pursuers
and self.robots
.
During the gameplay, each player is asked to update their intention for the next move coded in the self.next_robots
variable by calling the calculate_step
function. Afterward, the step is performed by calling the take_step̈́
function followed by the game checking each step, whether it complies with the rules of the game.
The game ends after a predefined number of steps or when all the evaders are captured.
In MCTS, each player has a predefined time for making the decision for a single robot given in the self.timeout
variable.
Implementation details of the MCTS can be found in resources and we will also cover it in the lab.
Few notes to improve the solution:
[(int, int),(int, int)]
to int
. So for a pair of poses, it returns the minimal distance.
MCTS (Monte Carlo tree search)
UCB1 (Bandit based Monte-Carlo Planning)
Dealing with simultaneous moves (Monte Carlo Tree Search Variants for
Simultaneous Move Games)
RAVE (Monte-Carlo Tree Search and Rapid Action Value
Estimation in Computer Go)
Your agents should be able to catch the GREEDY
evader in all the game scenarios provided, given enough steps. And in the scenarios where your agent is the evader, it should be able to escape.
You can easily generate new game setups by modifying the .game
files accordingly.
In the upload system, the student's solutions are tested against the teacher's GREEDY
policy players on private game scenarios.
Comparison of Monte Carlo evader and greedy evader against greedy pursuers.
Comparison of Monte Carlo pursuers and greedy pursuers against greedy evader.