Your task is to implement the value iteration and policy iteration methods to find the optimal strategy (policy) for the given MDP.
In the module mdp_agent.py, implement two classes:
ValueIterationAgent which will find the optimal strategy using the value iteration method, and
PolicyIterationAgent which will find the optimal strategy using the policy iteration method.
The interface of both classes is identical, both must implement the following methods:
| method | input parameters | output parameters | explanation |
|---|---|---|---|
__init__ | env: MDPProblem, gamma: float, epsilon: float | none | Agent initialization. |
find_policy | none | Policy | Returns the optimal strategy, i.e., a dictionary of pairs (state, action). |
env is the environment, i.e., an object of type kuimaze2.MDPProblem
gamma is the so-called “discount factor” from the range (0,1)
epsilon is the maximum allowed error for the values of individual states (used in value iteration)
find_policy() method must be a policy represented as a dictionary, where the key is always a state (instance of the class kuimaze2.State) and the value is the optimal action for that state (instance of the class kuimaze2.Action). The strategy must contain an action for all free states, including terminal ones. The specific action chosen for terminal states does not matter.
ValueIterationAgent and PolicyIterationAgent must correspond to the assignment. For example, it is not allowed to simply call ValueIteration.find_policy() in PolicyIterationAgent.find_policy() or to implement the value iteration algorithm in it (or vice versa). In such a case, the entire task will be evaluated with 0 points!
MDPProblem, or that you need to use non-public variables and methods (whose name starts with _), discuss it with your instructor.
kuimaze2 package in it.
kuimaze2 package, you will also find the script example_mdp.py, which also shows how to work with the environment. It can be used as a starting code for the implementation of both classes.
example_mdp.py) to extract shared parts into a common ancestor of both classes:class MDPAgent: # Parts common to both methods/agents ... class ValueIterationAgent(MDPAgent): # Parts specific for value iteration ... class PolicyIterationAgent(MDPAgent): # Parts specific for policy iteration ...
08-MDPs.
mdp_agent.py, or a ZIP archive with the module mdp_agent.py and other modules you created that your agent needs/imports. These files must be in the root of the archive, the archive must not contain any directories! Do not include/submit any modules that you received from us!
Learn about evaluation and scoring of the task.