====== Lab12 - Reinforecement Learning with an Inchworm Robot ======
^ Motivations and Goals ^
| Become familiar with [[courses:uir:hw:t4a-rl|t4a-rl assignment]] |
| Install and become familiar with t4a-rl setup |
| Design a simple reward function and a dummy absorbing state definition |
^ Tasks ([[courses:uir:internal:instructions:lab12|teacher]]) ^
| [[courses:uir:hw:t4a-rl|T4a-rl]] **(5 Points)** Reinforcement Learning |
==== Become Familiar with T4a-rl ====
[[courses:uir:hw:t4a-rl|T4a-rl - Reinforcement Learning]].
==== Installation (Ubuntu LTS >= 20.04) ====
* The assignment setup was designed for Python 3.10 or Python 3.11 so verify the installed version first.
python3 --version
* Install the required version, if necessary.
sudo apt update
sudo sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt install python3.10-full -y
* To separate the assignment from other Python packages managed by pip, install the virtual environment package and create a new virtual environment.
sudo apt install python3-pip -y
pip3 install virtualenv --upgrade
virtualenv inchworm_rl_venv --python=python3.10
* Enter the newly created virtual environment and install the required dependencies.
source inchworm_rl_venv/bin/activate && pip3 install -r requirements.txt
The previously mentioned steps are summarized in the provided ''install-venv.sh''.
Venv and Conda cause compatibility issues on machines already running Conda; hence, use Conda instead or deactivate Conda completely when using Venv.
==== Familiarizing with Assignment Setup ====
To familiarize yourself with the simulator setup, it is recommended that you use the MuJoCo simulator outside the reinforcement learning pipeline by following these steps.
- Download MuJoCo [[https://github.com/google-deepmind/mujoco/releases| GitHub releases]] and unpack it
- Open the MuJoCo simulator (run ''bin/simulate.sh'' in the MuJoCo unpacked archive root directory).
- Add ''inchworm.xml'' from the ''model'' directory by dragging and dropping it into the MuJoCo window.
Then you are free to
* Explore the joint's positions in the second //Control// card in the right column,
* Show visual elements by the ''4'' key,
* Hide the collision elements by the ''1'' key.
Note that visual elements are purely visual and play no role during training.
Examine the robot part names under //Rendering// tab in the left column.
* Show part names by selecting //Label -> Geom// used by ''is_touching'',
* Show part names coordinate frames by selecting //Frame -> Body// used to get position, rotation, and velocity,
* Show part names by selecting //Label -> Body// used to get position, rotation, and velocity.
==== Tasks ====
=== Reward Function Design I ===
Design a simple reward function that uses the average forward speed of the first and last servomotors in centimetres per second.
=== Absorbing State Detection ===
Design a simple absorbing state detection such that any state that touches the ground with other parts than scales and bumpers results in marking it as absorbing.
=== Reward Function Design II ===
Extend by previously implemented simple reward function by penalizing states where ''joint-1'' and ''joint-2'' move significantly close to the ground, i.e. below 0 degrees.