Search
In MDP and RL tasks, the functions find_policy_…() and learn_policy() should return the so-called policy. The specification states that the output should be represented by a dictionary. However, sometimes students submit solutions where the function returns something else, or the contents of the dictionary are not formally correct, indicating that they did not understand the specifications well. As solution authors, you should be able to test yourself whether the function's return value matches the requirements. How to do it?
find_policy_…()
learn_policy()
So what requirements should the returned strategy (policy) meet?
env
pytest
Automated tests let you: