Search
Your task is to implement the Q-learning algorithm to find the best strategy in an environment about which you have only incomplete information. You can only perform available actions and observe their effects (reinforcement learning).
In the rl_agent.py module, implement the RLAgent class. The class must implement the following interface:
rl_agent.py
RLAgent
__init__
env: RLProblem
gamma: float
alpha: float
learn_policy
Policy
env
kuimaze2.RLProblem
gamma
(0,1)
alpha
learn_policy()
kuimaze2.State
kuimaze2.Action
RLProblem
_
Note: The environment (env) is an instance of the RLProblem class. The initialization of the environment and visualization methods are the same as for MDPProblem, but working with the environment is different. Executing an action is necessary to learn anything about the environment. We do not have a map and the environment can only be explored using the main method env.step(action). The environment-simulator knows what the current state is.
MDPProblem
env.step(action)
kuimaze2
example_rl.py
Your code will be called by the evaluation script approximately like this:
import kuimaze2 import rl_agent env = kuimaze2.RLProblem(...) # here the script creates the environment # Calls of your code agent = rl_agent.RLAgent(env, gamma, alpha) policy = agent.learn_policy() # 20 second limit # Evaluation of one episode using your policy state = env.reset() episode_finished = False while not episode_finished: action = policy[state] next_state, reward, episode_finished = env.step(action) total_reward += reward state = next_state
During the implementation, you will probably need to work with the Q-function. In our discrete world, it will take the form of a table. This table can be represented in various ways, for example:
q_table
q_table[state][action]
numpy
In real RL tasks, we do not have a “map” of the environment and often do not even know the set of all states. RL then never ends because we can never be sure that we have already reached all reachable states (and that we have learned all q-values well enough). But in our task, RL must end and you must return a complete strategy, i.e., the best action for each state. Therefore, there must be a way to find out the set of all states. The list of all valid states can be obtained using the get_states() method.
get_states()
The example_rl.py script already contains the initialization of q_table in the form of a dictionary of dictionaries. If you choose a different representation of the q-value table, the initialization needs to be appropriately adjusted.
11-RL
Familiarize yourself with the evaluation and scoring of the task.