Search
Your task is to implement the value iteration and policy iteration methods to find the optimal strategy (policy) for the given MDP.
In the module mdp_agent.py, implement two classes:
mdp_agent.py
ValueIterationAgent
PolicyIterationAgent
The interface of both classes is identical, both must implement the following methods:
__init__
env: MDPProblem
gamma: float
epsilon: float
find_policy
Policy
env
kuimaze2.MDPProblem
gamma
(0,1)
epsilon
find_policy()
kuimaze2.State
kuimaze2.Action
ValueIteration.find_policy()
PolicyIterationAgent.find_policy()
MDPProblem
_
kuimaze2
example_mdp.py
class MDPAgent: # Parts common to both methods/agents ... class ValueIterationAgent(MDPAgent): # Parts specific for value iteration ... class PolicyIterationAgent(MDPAgent): # Parts specific for policy iteration ...
08-MDPs
Learn about evaluation and scoring of the task.