AI Gym: Frozen Lake

This task specification was created by one of your older schoolmates.

Description of Frozen Lake environment at AI Gym.

The problem

You are given a 2D map (8×8 or 4×4) with the starting and goal position, ice and holes. Your task is to evolve such a control strategy that allows you to get from start to the goal position as quickly as possible without falling into the cold water through a hole in ice. You can move up, down, left or right. The ice can be

non-slippery, so that you will move in the direction of your choice, or
slippery, so that your move is stochastic: if you want to move up, you will move either left, up, or, right with equal probability of 1/3 (the distribution can be changed).

The simulation ends if you get to the goal position, or if you step in a hole in ice. The evaluation you get from AI Gym is

0, if you step in a hole,
1, if you get to the goal position.

For the slippery ice you can execute several runs and estimate the probability you will get to the goal. For non-slippery ice you will get 0 if you do not reach the goal position, no matter how close you get. (Maybe it could be changed to 1/x where x is Manhattan distance to the goal position?)

Possible representations

You should evolve/search for the strategies. You need to represent the strategy somehow. * 4×4/8×8 matrix/vector representing the strategy (policy): the value in each cell tells us where should I go if I get to that position. * Matrix/vector representing a “profit” you get by stepping to that position (how good it is to go to the goal via that position). The decision in each cell is then given by the direction where the most profitable neighbor lies.

Possible tasks to explore

find the best strategy
try a smaller/bigger map, if the task is too/not enough hard
try to change the evaluation and solve a non-slippery ice
try different distributions of moves for slippery ice

Suggestions

When visualizing the lake, display the holes e.g. by character '.'.
Make sure that during the repeating executions you use always the same map. Doesn't AI Gym initialize the map in a different way each time?
If possible, I suggest the evaluation function to return more granular results than just 0 or 1. The distance to a goal is a good candidate. Longer path that brings you to the goal safely should be better than a short way that probably ends in a hole near the goal.

Table of Contents

AI Gym: Frozen Lake

The problem

Possible representations

Possible tasks to explore

Suggestions