Table of Contents

Tests

In MDP and RL tasks, the functions find_policy_…() and learn_policy() should return the so-called policy. The specification states that the output should be represented by a dictionary. However, sometimes students submit solutions where the function returns something else, or the contents of the dictionary are not formally correct, indicating that they did not understand the specifications well. As solution authors, you should be able to test yourself whether the function's return value matches the requirements. How to do it?

Strategy requirements

So what requirements should the returned strategy (policy) meet?

  1. What data type should the strategy be represented by?
  2. How many entries should this dictionary have? How do I find this count from the environment (env)?
  3. What are dictionary keys supposed to represent? What type should they be?
  4. What are dictionary values? What type should they be?
  5. Does the dictionary contain all the keys?
  6. What specific strategy should be returned for some simple environment?

Tasks

Why tests?

Automated tests let you: