Lab12 - Reinforecement Learning with an Inchworm Robot

Lab12 - Reinforecement Learning with an Inchworm Robot

Motivations and Goals
Become familiar with t4a-rl assignment
Install and become familiar with t4a-rl setup
Design a simple reward function and a dummy absorbing state definition
Tasks (teacher)
T4a-rl (5 Points) Reinforcement Learning

Become Familiar with T4a-rl

T4a-rl - Reinforcement Learning.

Installation (Ubuntu LTS >= 20.04)

The assignment setup was designed for Python 3.10 or Python 3.11 so verify the installed version first.

python3 --version

Install the required version, if necessary.

sudo apt update
sudo sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt install python3.10-full -y

To separate the assignment from other Python packages managed by pip, install the virtual environment package and create a new virtual environment.

sudo apt install python3-pip -y
pip3 install virtualenv --upgrade
virtualenv inchworm_rl_venv --python=python3.10

Enter the newly created virtual environment and install the required dependencies.

source inchworm_rl_venv/bin/activate && pip3 install -r requirements.txt

The previously mentioned steps are summarized in the provided install-venv.sh.

Venv and Conda cause compatibility issues on machines already running Conda; hence, use Conda instead or deactivate Conda completely when using Venv.

Familiarizing with Assignment Setup

To familiarize yourself with the simulator setup, it is recommended that you use the MuJoCo simulator outside the reinforcement learning pipeline by following these steps.

Download MuJoCo GitHub releases and unpack it
Open the MuJoCo simulator (run bin/simulate.sh in the MuJoCo unpacked archive root directory).
Add inchworm.xml from the model directory by dragging and dropping it into the MuJoCo window.

Then you are free to

Explore the joint's positions in the second Control card in the right column,
Show visual elements by the 4 key,
Hide the collision elements by the 1 key.

Note that visual elements are purely visual and play no role during training.

Examine the robot part names under Rendering tab in the left column.

Show part names by selecting Label → Geom used by is_touching,
Show part names coordinate frames by selecting Frame → Body used to get position, rotation, and velocity,
Show part names by selecting Label → Body used to get position, rotation, and velocity.

Tasks

Reward Function Design I

Design a simple reward function that uses the average forward speed of the first and last servomotors in centimetres per second.

Absorbing State Detection

Design a simple absorbing state detection such that any state that touches the ground with other parts than scales and bumpers results in marking it as absorbing.

Reward Function Design II

Extend by previously implemented simple reward function by penalizing states where joint-1 and joint-2 move significantly close to the ground, i.e. below 0 degrees.

Table of Contents