In MDP and RL tasks, the functions find_policy_…() and learn_policy() should return the so-called policy. The specification states that the output should be represented by a dictionary. However, sometimes students submit solutions where the function returns something else, or the contents of the dictionary are not formally correct, indicating that they did not understand the specifications well. As solution authors, you should be able to test yourself whether the function's return value matches the requirements. How to do it?

Strategy requirements

So what requirements should the returned strategy (policy) meet?

  1. What data type should the strategy be represented by?
  2. How many entries should this dictionary have? How do I find this count from the environment (env)?
  3. What are dictionary keys supposed to represent? What type should they be?
  4. What are dictionary values? What type should they be?
  5. Does the dictionary contain all the keys?
  6. What specific strategy should be returned for some simple environment?


  • Try to implement the above requirement checks directly in Python.
  • Organize your code appropriately so that you can easily run these tests through the pytest framework.
  • When implementing individual algorithms (value iteration, policy iteration, …) you will certainly use support functions for individual sub-parts, e.g. for policy evaluation. Try to think about the requirements for these support functions as well, and implement them in the form of pytest tests.pytest tests.

Why tests?

Automated tests let you:

  • to better understand/clarify individual requirements, including the interface of individual functions,
  • easily run a whole batch of tests at once (thus verifying compliance with the tested specifications),
  • to verify that changes in the code did not change important (tested) behavior,
courses/be5b33kui/tutorials/tests.txt · Last modified: 2023/04/24 15:48 by gamafili