RL: Scoring

The evaluation is composed of:

Automatic evaluation tests the performance of your agent in 5 environments. We run the strategy found by your agent for a given environment n-times and calculate the average sum of rewards collected by it. We then compare this with the teacher's solution (an agent executing the optimal strategy). For each testing environment, if your strategy reaches at least 80% of the teacher's value of the sum of rewards, you earn one point.
Manual evaluation is based on code quality(clean code).

Evaluated Performance	min	max	note
Quality of RL algorithm	0	5	Evaluation of the algorithm by an automatic evaluation system.
Quality of code	0	1	Comments, structure, elegance, cleanliness of code, appropriate naming of variables…

Quality of code (1 point):

appropriate comments, or the code is understandable enough that it does not need comments
reasonably long, or rather short methods/functions
variable names (nouns) and function/method names (verbs) help readability and understandability
pieces of code do not repeat (no copy-paste)
reasonable use of memory and processor time
consistent style of code throughout the entire file
clear structure of code (avoid, for example, unpythonic assignment of many variables in one line)
…

You can follow the PEP8 intended for Python. Most editors (certainly PyCharm) themselves point out deficiencies with regard to PEP8. You can also get inspired, for example, here or read about idiomatic Python on medium.