====== Lab12 - Reinforecement Learning with an Inchworm Robot ====== ^ Motivations and Goals ^ | Become familiar with [[courses:uir:hw:t4a-rl|t4a-rl assignment]] | | Install and become familiar with t4a-rl setup | | Design a simple reward function and a dummy absorbing state definition | ^ Tasks ([[courses:uir:internal:instructions:lab12|teacher]]) ^ | [[courses:uir:hw:t4a-rl|T4a-rl]] **(5 Points)** Reinforcement Learning | ==== Become Familiar with T4a-rl ==== [[courses:uir:hw:t4a-rl|T4a-rl - Reinforcement Learning]]. ==== Installation (Ubuntu LTS >= 20.04) ==== * The assignment setup was designed for Python 3.10 or Python 3.11 so verify the installed version first. python3 --version * Install the required version, if necessary. sudo apt update sudo sudo add-apt-repository ppa:deadsnakes/ppa -y sudo apt install python3.10-full -y * To separate the assignment from other Python packages managed by pip, install the virtual environment package and create a new virtual environment. sudo apt install python3-pip -y pip3 install virtualenv --upgrade virtualenv inchworm_rl_venv --python=python3.10 * Enter the newly created virtual environment and install the required dependencies. source inchworm_rl_venv/bin/activate && pip3 install -r requirements.txt The previously mentioned steps are summarized in the provided ''install-venv.sh''. Venv and Conda cause compatibility issues on machines already running Conda; hence, use Conda instead or deactivate Conda completely when using Venv. ==== Familiarizing with Assignment Setup ==== To familiarize yourself with the simulator setup, it is recommended that you use the MuJoCo simulator outside the reinforcement learning pipeline by following these steps. - Download MuJoCo [[https://github.com/google-deepmind/mujoco/releases| GitHub releases]] and unpack it - Open the MuJoCo simulator (run ''bin/simulate.sh'' in the MuJoCo unpacked archive root directory). - Add ''inchworm.xml'' from the ''model'' directory by dragging and dropping it into the MuJoCo window. Then you are free to * Explore the joint's positions in the second //Control// card in the right column, * Show visual elements by the ''4'' key, * Hide the collision elements by the ''1'' key. Note that visual elements are purely visual and play no role during training. Examine the robot part names under //Rendering// tab in the left column. * Show part names by selecting //Label -> Geom// used by ''is_touching'', * Show part names coordinate frames by selecting //Frame -> Body// used to get position, rotation, and velocity, * Show part names by selecting //Label -> Body// used to get position, rotation, and velocity. ==== Tasks ==== === Reward Function Design I === Design a simple reward function that uses the average forward speed of the first and last servomotors in centimetres per second. === Absorbing State Detection === Design a simple absorbing state detection such that any state that touches the ground with other parts than scales and bumpers results in marking it as absorbing. === Reward Function Design II === Extend by previously implemented simple reward function by penalizing states where ''joint-1'' and ''joint-2'' move significantly close to the ground, i.e. below 0 degrees.