Search
Help robot Emil find its way to the pot(s) of gold in a world filled with deadly pits. he implementation is possible only in Java. This assignment is undergoing some change for this semester, grading and details are subject to change.
Q: How does the testing map look?
A: It is the same size as the public map, it has the same gold and obstacle ratio. It only differs in the seed used to generate the map.
Implement decision making routine for an agent in a Wumpus world. The assignment consists of three tasks that will be handed out in parts.
The assignment is considered as “successfully completed” if you get at least 10 points.
The deadlines for tasks are soft, meaning there will be 1 point penalty for every day of submission after the deadline.
If you have any questions, you can use Assignment 2 Discussion Forum.
Robot Emil moves in grid world with 4 possible actions that have stochastic effects. Gold, pits and wumpus are terminal states. Reaching gold gives agent large positive utility while falling into a pit or encountering Wumpus gives agent large negative utility. Agent also gets small penalty for each move.
Action
CellContent
Download the Wumpus world environment with the solution template: Solution template.
Implement decision making for an agent in Wumpus world environment using Value Iteration algorithm. The agent must act optimally over a finite horizon h=200. For more info see e.g. mdps-exact-methods.pdf, page 8.
We will call wumpus.Agent.nextStep() to execute the policy. Any auxiliary code should be in wumpus.agent package. To run simulation, execute WumpusWorldCreator class.
wumpus.agent
WumpusWorldCreator
Your IDE must be configured to use the jar file in /lib as a dependency.
/lib
You can use WorldModel.getTransitions() to obtain all transition (along with the transition probability and the reward received) that may occur when a particular action is applied in a particular state.
WorldModel.getTransitions()
In order to assist you with debugging, we have prepared three visualization layers:
State-Value
DebugVis.DebugVis.setStateValues(float[][] stateValues)
State-Action-Value
DebugVis.setStateActionValues(float[][][] stateActionValues);
stateActionValues
DebugVis.setPolicy(Action[][] policy);
Implement decision making for an agent in Wumpus world environment using Monte-Carlo Tree Search algorithm. The agent should attempt to maximize the cummulative reward over the given horizon. For more info about MCTS in context of MDP planning see e.g. 07-mcp.pdf, slides 19-72 or the following survey and especially these lecture notes on UCT for MDPs. There are many different flavors of Monte-Carlo planning techniques for MDPs and you can chose any of them. For example, you can try single-state Monte-Carlo with random or other baseline policy rollouts, sparse-sampling, adaptive monte-carlo tree search with UCB or epsilon-greedy exploitation-exploration policy or any other instantiation of THTS algorithmic scheme.
Please download the Solution template
Your IDE must be configured to use the jar file in /lib as a dependency. To run simulation, execute Task2Creator and Task3Creator class for Task2 and Task 3 respectively.
Task2Creator
Task3Creator
Your task is to implement action selection mechanism within wumpus.agent.Agent.nextStep() method such that the agent attempts to maximize the sum of rewards received in the Wumpus world over the horizon of 50 steps and 100 steps in Task 2 and Task 3 respectively.
The solution should consists of wumpus.agent.Agent class and any other supporting classes that however need to be in wumpus.agent packages or some subpackage of wumpus.agent. The simulator will call wumpus.agent.Agent.nextStep() at each step to obtain an action that the agent executes at the particular state. The agents for Task 2 and Task 3 can be identical or different, whatever you find more appropriate.
In contrast to Task 1, in both Task 2 and Task 3, the simulation continues when the robot reaches gold. On top of that, in Task 3, there are several wumpuses moving randomly in the world, depicted as red circles. When a wumpus steps on the same cell as the agent, the agent dies (reward: -99) and the simulation ends.
You can use WorldModel.performAction(s, a, rnd) to simulate the outcome of action a being executed in state s using random number generator rnd. The random number generator should be initialized at the beginning of each simulation run as rnd = new Random(seed). If seed is a fixed constant, then the random number generator will generate the same sequence of random numbers on each program execution, which is handy for debugging.
WorldModel.performAction(s, a, rnd)
a
s
rnd
rnd = new Random(seed)
seed
Automatic evaluation scripts cannot run longer than 15 minutes. There will be 15 simulations for Task 2 (with 50 decision points during each simulation), and 10 simulations for Task 3 (with 100 decision points during each simulation). Thus, please limit the time to make a decision at each step of simulation to
In order to get full points for the assignment, your agent should be “reasonably” efficient. If your agent achieves average utility of > 500 in Task 2 and the algorithm generalizes also to different maps, you can expect to get 7 points for Task 2 by AE. Similarly, if your algorithm achieves utility of > 600 in Task 3, you can expect to get 7 points for Task 3 by AE.
Submit your solution through CW Upload system before the deadline as a single zip file. At the top-level of the zip file should be src directory that contains the source code. That is, the contents of the archive should have the following structure:
src
javax.vecmath.Point2i
WumpusWorldCreator.java
Action.ordinal()
Action.NORTH.ordinal() == 0
WorldModel.performAction
WorldState
WorldModel.getActions(state)