Search
You will use the kuimaze2.RLProblem environment when learning the best strategy for an unknown MDP using reinforcement learning methods (reinforcement learning). It is used in the fourth compulsory task 11-RL.
kuimaze2.RLProblem
11-RL
After creating an instance of the RLProblem class (see Usage), you can use the following methods:
RLProblem
env.get_states()
State
env.get_action_space()
MDPProblem.get_actions(state)
env.sample_action(action_probs)
action_probs
env.reset()
env.step(action)
new_state
None
reward
float
episode_finished
True
step()
reset()
env.render()
help(env.render)
RLAgent.render()
example_rl.py
The RLProblem environment is created the same way as MDPProblem, but the usage is different.
MDPProblem
Environment import:
>>> from kuimaze2 import Map, RLProblem
Creating a map to initialize the environment:
>>> MAP = "SG" >>> map = Map.from_string(MAP)
Creating a deterministic environment with graphical display:
>>> env1 = RLProblem(map, graphics=True)
Creating a non-deterministic environment (specifying the probabilities of where the agent will actually move):
>>> env2 = RLProblem(map, action_probs=dict(forward=0.8, left=0.1, right=0.1, backward=0.0))
List of all valid states in the environment:
>>> env2.get_states() [State(r=0, c=0), State(r=0, c=1)]
List of all actions that can be performed in some state of the environment:
>>> env2.get_action_space() [<Action.UP: 0>, <Action.RIGHT: 1>, <Action.DOWN: 2>, <Action.LEFT: 3>]
A randomly selected action may also be useful:
>>> env2.sample_action() # The result can be any of the possible actions. <Action.UP: 0>
The step method attempts to perform the selected action in the environment:
>>> env2.step(env2.sample_action())
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "...\kuimaze2\rl.py", line 60, in step raise NeedsResetError( kuimaze2.exceptions.NeedsResetError: RLProblem: Episode terminated. You must call reset() first.
reset
Calling the reset() method will return the initial state of the agent for the given episode:
>>> state = env2.reset() >>> state State(r=0, c=0)
Now we can call the step() method:
>>> action = env2.sample_action() >>> action <Action.DOWN: 2> >>> new_state, reward, episode_finished = env2.step(action) >>> new_state State(r=0, c=0) >>> reward -0.04 >>> episode_finished False
Action.DOWN
-0.04
So let's try to make random steps until the episode ends:
>>> while not episode_finished: ... action = env2.sample_action() ... new_state, reward, episode_finished = env2.step(action) ... print(f"{state=} {action=} {reward=} {new_state=} {episode_finished=}") ... state = new_state ... state=State(r=0, c=0) action=<Action.DOWN: 2> reward=-0.04 new_state=State(r=0, c=0) episode_finished=False state=State(r=0, c=0) action=<Action.RIGHT: 1> reward=-0.04 new_state=State(r=0, c=1) episode_finished=False state=State(r=0, c=1) action=<Action.UP: 0> reward=1.0 new_state=None episode_finished=True
Another call to the step() method would again throw an exception. The episode ended, we want to start a new one, so we need to call reset() again.