HW 4 - Reinforcement Learning

This homework serves as an introduction to deep reinforcement learning methods. Your goal will be to implement a particular approach to RL known as policy gradient, where a neural network learns to control a given dynamic system through interaction.

Your goal is to complete the implementation of 6 functions in the solution.py file. These functions implement different losses that we will use to train our neural network. In the end, you will be tasked with implementing a controller for an underactuated quadruped-robot system.

A jupyter notebook evaluation.ipynb is provided to guide you through the implementation and the theory behind the methods.

All the necessary files and instructions are available here https://github.com/urob-ctu/hw4. It is recommended to use Python 3.10 for implementation as this version was used for all the supplementary code.

All the required libraries are listed in requirements.txt. To install them (given you have python and pip installed), simply use

pip install -r requirements.txt

Submission and Evaluation

This assignment is worth a maximum of 15 points. There is a total of 6 functions, that need completion in the solution.py file. A total of 12 points are distributed among them. The last 3 points are awarded for your WalkerPolicy located in the WalkerPolicy.py file. The first point is just for uploading a woking policy class into the system, the next two points are gained based on the traversed distance in the x direction.

You can also gain up to 2 bonus points for beating other students' implementations of WalkerPolicy! The students' policies will be ranked by the total travelled distance and based on your relative performance, you can be awarded extra points for this assignment.

Upload solution.py, WalkerPolicy.py and possibly the stored weights of the policy in a zip archive to BRUTE. The deadline for the submission will be 3.1.2024 23:59:00.

In case of any questions, please do not hesitate to ask at tichyt11@fel.cvut.cz.