Search
This task will likely be extended!
Gomoku is a simple popular game played on a square grid. Players alternate in placing their mark (x or o) on an empty cell. The winner is the first player to get an unbroken row of exactly five marks horizontally, vertically, or diagonally.
An example terminal state – the winning trace is highlighted by uppercase letters X:
.Xo..... oXxx.... oXoo.x.. .Xo..... .Xo..... .....o.. ..x..... ........
Download task3_mcts.zip. In student.py, you should implement the following:
student.py
class MCTSBot: def __init__(self, play_as: int, time_limit: float): self.play_as = play_as self.time_limit = time_limit * 0.9 def play_action(self, board): # TODO: implement MCTS bot start_time = time.time() while (time.time() - start_time) < self.time_limit: pass # do some thinking... return random.choice( list(board.get_actions()) )
Your task is to implement the MCTS algorithm to play the game with a time limit of 10 second per move. The size of the board will be 8×8. You are given an example implementation of a bot that makes random actions. If you exceed the allotted time, your bot will be terminated and evaluation will be unsuccessful.
Your Task Create a new file named nn_mcts.py and implement the following components:
nn_mcts.py
1. Value Network Implement a PyTorch neural network that takes the current board state as input and outputs a scalar value representing the expected outcome for the current player (e.g., output in range [-1, 1]). You will also need to write a helper function to convert the game board into a PyTorch tensor (for example: 1.0 for the current player's marks, -1.0 for the opponent's marks, and 0.0 for empty cells).
PyTorch
2. Neural MCTS Bot Implement the NeuralMCTSBot class. You can reuse or inherit from MCTSBot from student.py.
NeuralMCTSBot
MCTSBot
import torch import torch.nn as nn from student import MCTSBot class NeuralMCTSBot(MCTSBot): def __init__(self, play_as: int, time_limit: float): super().__init__(play_as, time_limit) # TODO: Initialize your neural network # self.net = ValueNetwork() # load your model for evaluation here # torch.load("<name of your model>.pth", map_location="cpu")) def simulation(self, board): # TODO: Override the rollout/simulation phase # Instead of playing randomly to the end, convert the board to a tensor, # pass it through your neural network, and return the predicted value. pass def play_action(self, board): # TODO: Run MCTS loop with network evaluations pass
Instead of performing a random rollout when a node is expanded, query your Neural Network to get a value estimation for that state. Backpropagate this predicted value up the MCTS tree just like you would with a regular rollout result.
3. Model Training To make your network accurate, you will need to train it. You can implement a training script where your bot plays games against itself or a baseline bot (e.g. your implementation of MCTS bot). Store the board states encountered during the game, and once the game finishes, train your neural network to predict the final game outcome (e.g., 1 for a win, -1 for a loss, 0 for a draw) using those stored states. Save your trained model weights to a nn_mcts.pth file.
nn_mcts.pth
Submit your solution via Brute, where it will be evaluated automatically; the evaluation may take up to 15 minutes, so please be patient. Your submission should include both your original MCTS implementation (student.py) and your neural extension (nn_mcts.py), as well as your trained model file (e.g., nn_mcts.pth) if you implemented the neural network version. If you did not implement the neural network, you can still submit only the standard (non-neural) MCTS version (student.py) and receive partial credit. The standard MCTS solution is evaluated automatically and can earn up to 7 points. If you submit the neural network version, your NeuralMCTSBot will be evaluated under a strict time limit of 1.0 second per move against three reference MCTS bots using standard rollouts with time limits of 0.5 s, 1.0 s, and 1.5 s; for each opponent, winning at least 4 out of 6 matches earns 1 point, for a maximum of 3 additional points. Make sure your bot correctly loads its pretrained weights in the constructor and supports running on CPU if a GPU is not available, otherwise the evaluation may fail.