===== T4b-inch - Deployment of Reinforcement Learning on a Real Robot ======


|**Deadline**  | Submit at least 24 hours before your exam  |
| | January 11, 2025, 23:59 PST |
| |   <del>January 13, 2025</del> \\ <del>January 20, 2025</del> \\ <del>February 04, 2025</del>  \\ <del>February 11, 2025</del> |
|**Points**  |  5 (**Bonus Points**) |
|**Label in BRUTE**  | t4b-inch |
|**Files to submit** | archive with ''evaluator.py'' and ''agent.msh''|

----

==== Introduction ====

This task extends the previous task; hence, students are advised to familiarize themself with the [[courses:uir:hw:t4a-rl|T4a-rl - Reinforcement Learning]] task first as all of the information from the previous task is valid for this task as well.

==== Assignement ====

This task further evaluates the trained gait by deploying the trained gait to a real inchworm robot or by further evaluation in the BRUTE system.

==== Evaluation ====

The project files (''evaluator.py'' and ''agent.msh'') are submitted to the BRUTE system before the student's exam, when the automatic evaluation assigns additional points.
The deployment will be possible during the project submission, when the experimental setup, including the required packages and hardware, is prepared for the student's convenience on designated PCs.

Before the deployment, the submitted ''agent.msh'' is downloaded from BRUTE to a designated PC.
The gait is run for 30 on the real robot, starting in the most prolonged pose.
  * Each 5 cm traversed by the backmost part of the robot is awarded a single point. 
The median of three runs is used to produce the final value score.
A run where the robot loses stability or fails in another way can be repeated, up to three repetitions in total.

Aside from the evaluation using the real robot, the trained agent's policy is run for 30 seconds by the evaluation system.
The distance travelled by a backmost servomotor (''servo-3'') is measured and averaged over the last ten simulation steps to determine the distance travelled by the robot.
The BRUTE evaluates each run as follows.
  * 1 point is assigned if only the scales and bumpers touched the ground,
  * for each 5 cm travelled after the 20 cm mark, 1 point is assigned.
These points are summed up, and up to 5 points are assigned for the simulation run.
Ten simulation runs are executed, and the median of the points achieved is assigned as the final score.

Then, the points assigned for deployment and the extended evaluation in BRUTE are summed up, and up to 5 five points are assigned as the final score.

<note tip>Students can deploy their gaits on the real robots during the last lab.</note>


==== Hint and Observations ==== 

For a real deployment, two additional criteria should be considered.
Firstly, the proposed gait should be able to disengage the scales when moving forward and reengage them when staying in place.
Secondly, the proposed gait should not unnecessarily lift the centre of mass as it makes a robot more prone to losing balance.