====== Lab12 - Reinforecement Learning with an Inchworm Robot ======

^ Motivations and Goals  ^
| Become familiar with [[courses:uir:hw:t4a-rl|t4a-rl assignment]] |
| Install and become familiar with t4a-rl setup |
| Design a simple reward function and a dummy absorbing state definition |
^ Tasks ([[courses:uir:internal:instructions:lab12|teacher]]) ^
| [[courses:uir:hw:t4a-rl|T4a-rl]] **(5 Points)** Reinforcement Learning |

==== Become Familiar with T4a-rl ====

[[courses:uir:hw:t4a-rl|T4a-rl - Reinforcement Learning]].

==== Installation (Ubuntu LTS >= 20.04) ====

  * The assignment setup was designed for Python 3.10 or Python 3.11 so verify the installed version first.

<code bash>
python3 --version
</code>

  * Install the required version, if necessary.

<code bash>
sudo apt update
sudo sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt install python3.10-full -y
</code>

  * To separate the assignment from other Python packages managed by pip, install the virtual environment package and create a new virtual environment.

<code bash>
sudo apt install python3-pip -y
pip3 install virtualenv --upgrade
virtualenv inchworm_rl_venv --python=python3.10
</code>

  * Enter the newly created virtual environment and install the required dependencies.

<code bash>
source inchworm_rl_venv/bin/activate && pip3 install -r requirements.txt
</code>
    
The previously mentioned steps are summarized in the provided ''install-venv.sh''.

<note important>Venv and Conda cause compatibility issues on machines already running Conda; hence, use Conda instead or deactivate Conda completely when using Venv.
</note>

==== Familiarizing with Assignment Setup ====

To familiarize yourself with the simulator setup, it is recommended that you use the MuJoCo simulator outside the reinforcement learning pipeline by following these steps.
  - Download MuJoCo [[https://github.com/google-deepmind/mujoco/releases| GitHub releases]] and unpack it
  - Open the MuJoCo simulator (run ''bin/simulate.sh'' in the MuJoCo unpacked archive root directory).
  - Add ''inchworm.xml'' from the ''model'' directory by dragging and dropping it into the MuJoCo window. 

Then you are free to
  * Explore the joint's positions in the second //Control// card in the right column,
  * Show visual elements by the ''4'' key,
  * Hide the collision elements by the ''1'' key.
Note that visual elements are purely visual and play no role during training.

Examine the robot part names under //Rendering// tab in the left column.
  * Show part names by selecting //Label -> Geom// used by ''is_touching'',
  * Show part names coordinate frames by selecting //Frame -> Body// used to get position, rotation, and velocity,
  * Show part names by selecting //Label -> Body// used to get position, rotation, and velocity.

==== Tasks ====

=== Reward Function Design I ===

Design a simple reward function that uses the average forward speed of the first and last servomotors in centimetres per second.

=== Absorbing State Detection ===

Design a simple absorbing state detection such that any state that touches the ground with other parts than scales and bumpers results in marking it as absorbing.

=== Reward Function Design II ===

Extend by previously implemented simple reward function by penalizing states where ''joint-1'' and ''joint-2'' move significantly close to the ground, i.e. below 0 degrees.