Your task is to implement the value iteration and policy iteration methods to find the optimal strategy (policy) for the given MDP.
In the module mdp_agent.py
, implement two classes:
ValueIterationAgent
which will find the optimal strategy using the value iteration method, and
PolicyIterationAgent
which will find the optimal strategy using the policy iteration method.
The interface of both classes is identical, both must implement the following methods:
method | input parameters | output parameters | explanation |
---|---|---|---|
__init__ | env: MDPProblem , gamma: float , epsilon: float | none | Agent initialization. |
find_policy | none | Policy | Returns the optimal strategy, i.e., a dictionary of pairs (state, action). |
env
is the environment, i.e., an object of type kuimaze2.MDPProblem
gamma
is the so-called “discount factor” from the range (0,1)
epsilon
is the maximum allowed error for the values of individual states (used in value iteration)
find_policy()
method must be a policy represented as a dictionary, where the key is always a state (instance of the class kuimaze2.State
) and the value is the optimal action for that state (instance of the class kuimaze2.Action
). The strategy must contain an action for all free states, including terminal ones. The specific action chosen for terminal states does not matter.
ValueIterationAgent
and PolicyIterationAgent
must correspond to the assignment. For example, it is not allowed to simply call ValueIteration.find_policy()
in PolicyIterationAgent.find_policy()
or to implement the value iteration algorithm in it (or vice versa). In such a case, the entire task will be evaluated with 0 points!
MDPProblem
, or that you need to use non-public variables and methods (whose name starts with _
), discuss it with your instructor.
kuimaze2
package in it.
kuimaze2
package, you will also find the script example_mdp.py
, which also shows how to work with the environment. It can be used as a starting code for the implementation of both classes.
example_mdp.py
) to extract shared parts into a common ancestor of both classes:class MDPAgent: # Parts common to both methods/agents ... class ValueIterationAgent(MDPAgent): # Parts specific for value iteration ... class PolicyIterationAgent(MDPAgent): # Parts specific for policy iteration ...
08-MDPs
.
mdp_agent.py
, or a ZIP archive with the module mdp_agent.py
and other modules you created that your agent needs/imports. These files must be in the root of the archive, the archive must not contain any directories! Do not include/submit any modules that you received from us!
Learn about evaluation and scoring of the task.