====== Task 3: Playing Two-Player Game with MCTS ======

===== Five-in-a-row =====

Five-in-a-row (also called Gomoku) is a simple popular game played on a
square grid. Players alternate in placing their mark (x or o) on an empty
cell. The winner is the first player to get an unbroken row of //exactly// five marks
horizontally, vertically, or diagonally.

An example terminal state -- the winning trace is highlighted by uppercase
letters X:

<code>
.Xo.....
oXxx....
oXoo.x..
.Xo.....
.Xo.....
.....o..
..x.....
........
</code>


===== Your Task =====

Download {{ :courses:b4b36zui:tasks:task3_mcts.zip |}}. 

Your task is to implement MCTS algorithm to play the game efficiently within 
the time limit of 1 second per move. Use a custom heuristic pseudo-random simulation. Add a preference for the UCB bandit to explore
more promising moves, based on the rules of the game.

In ''student.py'', you should implement the following:

<code python>
class MCTSBot:
    def __init__(self, play_as: int):
        self.play_as = play_as

    def play_action(self, state: State):
        # TODO: implement MCTS bot according to the assignment task.
        return np.random.choice(state.legal_actions())
</code>

You are given an
example implementation of the game, code for interaction with the opponent
player and a sample random bot (as above). From standard input, the program receives the
size of the game board, the player to play as, and an alternating sequence of
actions of the opponent after your bot makes and prints an action. 

The evaluation keeps track of the current board, so it is important that you do
not change code outside of MCTSBot in your submission, to make sure yours 
and opponent's play is indeed consistent. 

There is a time limit of 1 seconds per move.
To make sure that you fit within the time limit, you can track the elapsed time
with Python's package ''time'' and the function ''time.time()''. The time
limit is strict -- your program will be shut down after the limit.

It is up to you how you solve the time management, but if you do it simply as
printing the action after some time is over, it is preferable that you set an
internal time limit that is below the hard deadline (maybe 0.9sec), so that
there is still large enough window of time before the deadline.

**Value function hint**: As the game outcomes are +-1 for win/loss, when you use a value function, make sure
that the heuristic values are within the open interval of ''(-1, 1)'' so the
algorithm has preference for choosing real game outcomes rather than the
heuristic ones, given a choice between the two in a search tree. This can be
done using a function that "saturates" outputs at -1 and 1 for any inputs 
''x'', such as ''+- 2*sigmoid(x)-1''

===== Evaluation =====

Upload your solution (only ''student.py'') to
[[https://cw.felk.cvut.cz/brute/teacher/course/1216|Brute]], where it will be
evaluated automatically.

The algorithm will play a number of matches against a portfolio of opponents of
various strength.

Based on the score against the opponent, you will receive 1 or 2 points, if the
score is higher than 2 or 5 respectively.

If timeout occurs more than 2 times against an opponent, you will receive zero 
points for that scenario. The matches with the timeout have a zero score.
In case any inconsistent play is detected or an error occurs, you will receive 
zero points for that scenario.