====== Task 3: Gomoku and MCTS ======
===== The Game =====

Gomoku is a simple popular game played on a
square grid. Players alternate in placing their mark (x or o) on an empty
cell. The winner is the first player to get an unbroken row of //exactly// five marks
horizontally, vertically, or diagonally.

An example terminal state -- the winning trace is highlighted by uppercase
letters X:

<code>
.Xo.....
oXxx....
oXoo.x..
.Xo.....
.Xo.....
.....o..
..x.....
........
</code>


===== Your Task =====

Download {{:courses:zui:tasks:task3_mcts.zip}}. In ''student.py'', you should implement the following:

<code python>
class MCTSBot:
    def __init__(self, play_as: int, time_limit: float):
        self.play_as = play_as
        self.time_limit = time_limit * 0.9

    def play_action(self, board):
        # TODO: implement MCTS bot

        start_time = time.time()
        while (time.time() - start_time) < self.time_limit:
        	pass # do some thinking...

        return random.choice( list(board.get_actions()) )
</code>

Your task is to implement the MCTS algorithm to play the game with a time limit of 10 second per move. The size of the board will be 8x8. You are given an example implementation of a bot that makes random actions. If you exceed the allotted time, your bot will be terminated and evaluation will be unsuccessful.

===== Task 3 Extension: Neural Network MCTS (NN-MCTS) ====
Your Task
Create a new file named ''nn_mcts.py'' and implement the following components:

1. Value Network
Implement a ''PyTorch'' neural network that takes the current board state as input and outputs a scalar value representing the expected outcome for the current player (e.g., output in range [-1, 1]). 
You will also need to write a helper function to convert the game board into a PyTorch tensor (for example: 1.0 for the current player's marks, -1.0 for the opponent's marks, and 0.0 for empty cells).

2. Neural MCTS Bot
Implement the ''NeuralMCTSBot'' class. You can reuse or inherit from ''MCTSBot'' from ''student.py''.

<code python>
import torch
import torch.nn as nn
from student import MCTSBot

class NeuralMCTSBot(MCTSBot):
    def __init__(self, play_as: int, time_limit: float):
        super().__init__(play_as, time_limit)
        # TODO: Initialize your neural network
        # self.net = ValueNetwork()
        # load your model for evaluation here
        # torch.load("<name of your model>.pth", map_location="cpu"))
        
    def simulation(self, board):
        # TODO: Override the rollout/simulation phase
        # Instead of playing randomly to the end, convert the board to a tensor,
        # pass it through your neural network, and return the predicted value.
        pass

    def play_action(self, board):
        # TODO: Run MCTS loop with network evaluations
        pass
</code>

Instead of performing a random rollout when a node is expanded, query your Neural Network to get a value estimation for that state. Backpropagate this predicted value up the MCTS tree just like you would with a regular rollout result.

3. Model Training
To make your network accurate, you will need to train it. You can implement a training script where your bot plays games against itself or a baseline bot (e.g. your implementation of MCTS bot). Store the board states encountered during the game, and once the game finishes, train your neural network to predict the final game outcome (e.g., 1 for a win, -1 for a loss, 0 for a draw) using those stored states. Save your trained model weights to a ''nn_mcts.pth'' file.


==== Hints ====
A network trained on just a handful of self-play games may perform worse than standard rollouts. Students are therefore encouraged to let training run for a longer time and continuously generate additional self-play data. Even a very small network can noticeably improve if trained on thousands of board states instead of hundreds. After validating that everything works as expected, it may make sense to run the data generation and/or training over night to achieve strong results. It is also strongly recommended to periodically save model checkpoints during training (for example after every training epoch or after every few games). This makes it possible to resume training later, compare different versions of the network, and avoid losing progress if the training process is interrupted.

Another useful strategy is to separate data generation from model training. Instead of updating the network immediately after every single game ("online" training), students may achieve more stable results by first collecting a larger dataset of played positions and outcomes (generated with standard MCTS) and then training the network offline in batches. Offline training often leads to smoother learning because the model is updated using a more diverse set of examples rather than only the most recent game. This approach also makes debugging and experimentation easier.
===== Evaluation =====
Submit your solution via Brute, where it will be evaluated automatically; the evaluation may take up to 15 minutes, so please be patient. Your submission should include both your original MCTS implementation (''student.py'') and your neural extension (''nn_mcts.py''), as well as your trained model file (e.g., ''nn_mcts.pth'') if you implemented the neural network version. If you did not implement the neural network, you can still submit only the standard (non-neural) MCTS version (''student.py'') and receive partial credit. The standard MCTS solution is evaluated automatically and can earn up to 7 points. If you submit the neural network version, your ''NeuralMCTSBot'' will be evaluated under a strict time limit of 1.0 second per move against three reference MCTS bots using standard rollouts with time limits of 0.5 s, 1.0 s, and 1.5 s; for each opponent, winning at least 4 out of 6 matches earns 1 point, for a maximum of 3 additional points. Make sure your bot correctly loads its pretrained weights in the constructor and supports running on CPU if a GPU is not available, otherwise the evaluation may fail.