This homework serves as an introduction to deep reinforcement learning methods. Your goal will be to implement a particular approach to RL known as policy gradient, where a neural network learns to control a given dynamic system through interaction.
Your goal is to complete the implementation of 6 functions in the solution.py
file. These functions implement different losses that we will use to train our neural network. In the end, you will be tasked with implementing a controller for an underactuated quadruped-robot system.
A jupyter notebook evaluation.ipynb
is provided to guide you through the implementation and the theory behind the methods.
All the necessary files and instructions are available here https://github.com/urob-ctu/hw4. It is recommended to use Python 3.10 for implementation as this version was used for all the supplementary code.
All the required libraries are listed in requirements.txt
. To install them (given you have python and pip installed), simply use
pip install -r requirements.txt
This assignment is worth a maximum of 15 points. There is a total of 6 functions, that need completion in the solution.py
file. A total of 12 points are distributed among them. The last 3 points are awarded for your WalkerPolicy
located in the WalkerPolicy.py
file. The first point is just for uploading a woking policy class into the system, the next two points are gained based on the traversed distance in the x direction.
You can also gain up to 2 bonus points for beating other students' implementations of WalkerPolicy
! The students' policies will be ranked by the total travelled distance and based on your relative performance, you can be awarded extra points for this assignment.
Upload solution.py
, WalkerPolicy.py
and possibly the stored weights of the policy in a zip archive to BRUTE. The deadline for the submission will be 3.1.2024 23:59:00.
In case of any questions, please do not hesitate to ask at tichyt11@fel.cvut.cz.