HW5 - Deep Reinforcement Learning

This homework serves as an introduction to deep reinforcement learning methods. Your goal will be to implement a particular approach to RL known as policy gradient, where a neural network learns to control a given dynamic system through interaction.

Your goal is to complete the implementation of 6 functions in the solution.py file. These functions implement different losses that we will use to train our neural network. In the end, you will be tasked with implementing a controller for an underactuated quadruped-robot system.

A jupyter notebook task.ipynb is provided to guide you through the implementation and the theory behind the methods.

All the necessary files and instructions are available here https://github.com/urob-ctu/hw5-rl.

Submission and Evaluation

This assignment is worth a maximum of 10+1 points. There is a total of 6 functions, that need completion in the solution.py file. A total of 10 points are distributed among them. The last 1 point is awarded for your WalkerPolicy located in the WalkerPolicy.py file. It must be capable of traveling at least 1 meter in the positive x-direction in less than five simulation seconds. You can gain up to 2 additional points from the tournament, where the distance traveled in 10 simulation seconds determines your rank among all submissions.

Upload solution.py, WalkerPolicy.py, and possibly the stored weights of the policy to BRUTE in a zip archive. The deadline for the submission will be 4.1.2025 at 23:59:00.

If you have any questions, please do not hesitate to ask at korcadav@fel.cvut.cz.