This homework serves as an introduction to deep reinforcement learning methods. Your goal will be to implement a particular approach to RL known as policy gradient, where a neural network learns to control a given dynamic system through interaction.
Your goal is to complete the implementation of 6 functions in the solution.py
file. These functions implement different losses that we will use to train our neural network. In the end, you will be tasked with implementing a controller for an underactuated quadruped-robot system.
A jupyter notebook task.ipynb
is provided to guide you through the implementation and the theory behind the methods.
All the necessary files and instructions are available here https://github.com/urob-ctu/hw5-rl.
This assignment is worth a maximum of 10+1 points. There is a total of 6 functions, that need completion in the solution.py
file. A total of 10 points are distributed among them. The last 1 point is awarded for your WalkerPolicy
located in the WalkerPolicy.py
file. It must be capable of traveling at least 1 meter in the positive x-direction in less than five simulation seconds. You can gain up to 2 additional points from the tournament, where the distance traveled in 10 simulation seconds determines your rank among all submissions.
Upload solution.py
, WalkerPolicy.py
, and possibly the stored weights of the policy to BRUTE in a zip archive. The deadline for the submission will be 4.1.2025 at 23:59:00.
If you have any questions, please do not hesitate to ask at korcadav@fel.cvut.cz.