RL: Scoring

The evaluation is composed of:

  1. Automatic evaluation tests the performance of your agent in 5 environments. We run the strategy found by your agent for a given environment n-times and calculate the average sum of rewards collected by it. We then compare this with the teacher's solution (an agent executing the optimal strategy). For each testing environment, if your strategy reaches at least 80% of the teacher's value of the sum of rewards, you earn one point.
  2. Manual evaluation is based on code quality(clean code).
Evaluated Performance min max note
Quality of RL algorithm 0 5 Evaluation of the algorithm by an automatic evaluation system.
Quality of code 0 1 Comments, structure, elegance, cleanliness of code, appropriate naming of variables…

Quality of code (1 point):

You can follow the PEP8 intended for Python. Most editors (certainly PyCharm) themselves point out deficiencies with regard to PEP8. You can also get inspired, for example, here or read about idiomatic Python on medium.