====== HW 5 - Deep Reinforcement Learning ======

This homework serves as an introduction to deep reinforcement learning methods. Your goal will be to implement a particular approach to RL known as policy gradient, where a neural network learns to control a given dynamic system through interaction.

Your goal is to complete the implementation of 6 functions in the ''solution.py'' file. These functions implement different losses that we will use to train our neural network. In the end, you will be tasked with implementing a controller for an underactuated quadruped-robot system.

A jupyter notebook ''task.ipynb'' is provided to guide you through the implementation and the theory behind the methods. 

All the necessary files and instructions are available here [[https://github.com/urob-ctu/hw5-rl]]. 


===== Submission and Evaluation =====

This assignment is worth a maximum of <wrap em>10+1 points</wrap>. There is a total of 6 functions, that need completion in the ''solution.py'' file. A total of 10 points are distributed among them. The last 1 point is awarded for your ''WalkerPolicy'' located in the ''WalkerPolicy.py'' file. It must be capable of traveling at least 1 meter in the positive x-direction in less than five simulation seconds. 

Upload ''solution.py'', ''WalkerPolicy.py'', and possibly the stored weights of the policy to BRUTE in a zip archive. The deadline for the submission will be 4.1.2025 at 23:59:00.


{{https://cw.fel.cvut.cz/wiki/_media/courses/b3b33urob/tutorials/walker.gif|Trained walker policy}}