Your task is to implement the Q-learning algorithm to find the best strategy in an environment about which you have only incomplete information. You can only perform available actions and observe their effects (reinforcement learning).
In the rl_agent.py
module, implement the RLAgent
class. The class must implement the following interface:
method | input parameters | output parameters | explanation |
---|---|---|---|
__init__ | env: RLProblem , gamma: float , alpha: float | none | Agent initialization. |
learn_policy | none | Policy | Returns the best strategy, i.e., a dictionary of pairs (state, action). |
env
is the environment, i.e., an object of type kuimaze2.RLProblem
,
gamma
is the so-called “discount factor” from the range (0,1)
,
alpha
is the so-called “learning rate”; the passed value is just a recommendation: if you want to use a different one, or change/adapt it during learning, feel free to do so.
learn_policy()
method must be a policy represented as a dictionary, where the key is always a state (instance of the kuimaze2.State
class) and the value is the optimal action for the given state (instance of the kuimaze2.Action
class). The strategy must contain an action for all free states, including terminal ones. For terminal states, the specific chosen action does not matter.
learn_policy()
of your RLAgent
will implement the Q-learning algorithm. But of course, you can/should decompose it to shorter functions/methods.
RLProblem
or that you need to use non-public variables and methods (whose name starts with _
), discuss it with your instructor.
Note: The environment (env
) is an instance of the RLProblem
class. The initialization of the environment and visualization methods are the same as for MDPProblem
, but working with the environment is different. Executing an action is necessary to learn anything about the environment. We do not have a map and the environment can only be explored using the main method env.step(action)
. The environment-simulator knows what the current state is.
kuimaze2
environment. Set up an updated version of the kuimaze2
package in it.
kuimaze2
package, you will also find the script example_rl.py
. The script contains a skeleton of the RLAgent class, demonstrates a random walk through the environment, initialization of the q-value table, and visualization. It can be used as a starting code for the implementation of the RLAgent
class.
Your code will be called by the evaluation script approximately like this:
import kuimaze2 import rl_agent env = kuimaze2.RLProblem(...) # here the script creates the environment # Calls of your code agent = rl_agent.RLAgent(env, gamma, alpha) policy = agent.learn_policy() # 20 second limit # Evaluation of one episode using your policy state = env.reset() episode_finished = False while not episode_finished: action = policy[state] next_state, reward, episode_finished = env.step(action) total_reward += reward state = next_state
During the implementation, you will probably need to work with the Q-function. In our discrete world, it will take the form of a table. This table can be represented in various ways, for example:
q_table
(as shown in the example_rl.py
module), where you can access individual elements as q_table[state][action]
;
numpy
array, which is indexed by three “coordinates”: r, c, action;
In real RL tasks, we do not have a “map” of the environment and often do not even know the set of all states. RL then never ends because we can never be sure that we have already reached all reachable states (and that we have learned all q-values well enough). But in our task, RL must end and you must return a complete strategy, i.e., the best action for each state. Therefore, there must be a way to find out the set of all states. The list of all valid states can be obtained using the get_states()
method.
The example_rl.py
script already contains the initialization of q_table
in the form of a dictionary of dictionaries. If you choose a different representation of the q-value table, the initialization needs to be appropriately adjusted.
11-RL
.
rl_agent.py
module, or a ZIP archive with the rl_agent.py
module and other modules you created that your agent needs/imports. These files must be in the root of the archive, the archive must not contain any directories! Do not submit any modules that you received from us!
Familiarize yourself with the evaluation and scoring of the task.