Search
evaluator.py
agent.msh
report.pdf
The inchworm robot considered for this assignment is inspired by snakes' and caterpillars' morphology. It consists of four servomotors acting in the same plane to produce a 1D motion using the flexible scale-like contact friction pads with directional frictional anisotropy, meaning there is a preferred direction where the friction coefficient is lower than the friction coefficient in the opposite direction. This frictional anisotropy enables the robot to anchor down one end while the other reaches a new pose.
The inchworm locomotion is organized in a repeated motion called motion gait. There are several hand-designed gaits, such as the sliding gait, where frictional anisotropy is utilized for basic locomotion consisting of two gait phases (contraction and extension). The sliding gait can be further extended by the weight manipulation in the balancing gait via additional balancing phases (balance forward and backwards). Finally, the contact friction pads can be turned off and on using the rigid robot frame instead of the flexible scales when moving forward, as exploited in the stepping gait that adds numerous new gait phases (front up, front down, back up and back down). The well-performing gait should integrate the approaches to the locomotion from these gaits.
In this task, you aim to design a suitable gait for an inchworm robot using reinforcement learning. To do so, two functions will be implemented: a reward function and an absorbing state definition. Both functions use the inchworm robot state interface, outlined in inchworm_interface.py and described further below.
inchworm_interface.py
A reward function computation (compute_reward(self, inchworm_state: InchwormInterface) → float in evaluator.py) uses the inchworm robot state to reward it by assigning a float value, i.e. desirable robot states should be rewarded by a higher value than undesirable ones. Desirable states enable a robot to move forward as fast as possible, using only scales and bumpers to touch the ground; hence, for example, rewarding inchworm robot state based on the forward velocity in centimetres per second and penalizing it by some fixed value for touching the ground with other parts then the scales and bumpers. For additional insight into possible reward function components, see Observations and Hints in the Appendix.
compute_reward(self, inchworm_state: InchwormInterface) → float
An absorbing state classification (is_absorbing(self, inchworm_state: InchwormInterface) → bool in evaluator.py as well) uses the inchworm robot state to decide whether the robot state should be assumed as absorbing, i.e. hard or impossible for a robot to recover or otherwise undesirable. For additional insight into possible absorbing state classification, see Observations and Hints in the Appendix.
is_absorbing(self, inchworm_state: InchwormInterface) → bool
The designed functions are used during the policy learning that uses the Soft Actor-Critic algorithm implemented by the Mushroom-RL framework to evaluate the inchworm states during the simulation execution realised by the MuJoCo simulator. After each simulation step, the robot state is evaluated by the reward function followed by the absorbing state detection function to steer the training algorithm.
The robot frame consists of 4 servomotors, referred to as servo-0 to servo-3 (numbering starts from the robot front), two stiff bumpers called bumper-front and bumper-back and depicted in blue, two contact scales called scales-front and scales-back and depicted in grey, two short brackets called bracket-front and bracket-back and depicted in yellow, and finally the double-sided bracket called bracket-middle depicted in green.
servo-0
servo-3
bumper-front
bumper-back
scales-front
scales-back
bracket-front
bracket-back
bracket-middle
The state of the inchworm robot can be accessed via six functions: three focus on the robot parts, two focus on the joint states, and one focus on the collisions. The robot is located on the infinite xy plane with a forward direction identical to the x-axis and joint axes parallel to the y-axis. The simulation is set to operate in meters.
The get_part_position(self, part_name : str) → “np.array | None” takes a part name (defined in the Inchworm Robot Parts section using monospace text) and returns the part position vector [x,y,z] in the world coordinates.
get_part_position(self, part_name : str) → “np.array | None”
inchworm_state.get_part_position("bracket-front") # gets the position of the front bracket inchworm_state.get_part_position("servo-0") # returns the position of the front servomotor
The get_part_rotation(self, part_name : str, degrees : bool = True) → “np.array | None” takes a part name and returns the part XYZ Euler angles in Instrinsic convention within the part frame. Note that all parts coordinate frames follow the D-H notion, i.e. all parts coordinate frames are aligned with the associated joints such that the x-axis always points to the closest joint axis and the joint axes are identical to the z-axis. The returned values are in degrees but can be changed to radians by setting the optional argument degrees=False.
get_part_rotation(self, part_name : str, degrees : bool = True) → “np.array | None”
degrees=False
inchworm_state.get_part_position("bracket-middle") # gets the rotation of the middle bracket inchworm_state.get_part_position("servo-1") # returns the rotation of the second servomotor
The get_part_velocity(self, part_name : str) → “np.array | None” takes a part name and returns the vector of angular velocities around respective axes in radians per second, followed by the linear velocity in meters per second.
get_part_velocity(self, part_name : str) → “np.array | None”
inchworm_state.get_part_position("bumper-back") # gets the velocity of the middle bracket inchworm_state.get_part_position("servo-3") # returns the velocity of the back-most servomotor
The get_joint_position(self, joint_name : str, degrees : bool = True) → “float | None” takes a joint name (joint-0, joint-1, joint-2, joint-3) and returns the joint position (rotation around its shaft). The joint rotation is in degrees but can be changed to radians by setting the optional argument degrees=False.
get_joint_position(self, joint_name : str, degrees : bool = True) → “float | None”
joint-0
joint-1
joint-2
joint-3
The get_joint_velocity(self, joint_name : str) → “float | None” takes a joint name and returns joint rotation speed in radians per second.
get_joint_velocity(self, joint_name : str) → “float | None”
Finally, the is_touching(self, part_name_1 : str, part_name_2 : str) → “bool | None” takes two parts' names and returns whether they are touching each other (colliding). Moreover, the ground can be used to check whether a part is touching the ground, and the no-touch can be used to decide whether any non-contacting part is touching another provided part.
is_touching(self, part_name_1 : str, part_name_2 : str) → “bool | None”
ground
no-touch
inchworm_state.is_touching("ground", "bumper-front") # check for collision between ground and front bumper inchworm_state.is_touching("ground", "no-touch") # check for collision between ground and non-contacting parts
Note that any of the abovementioned functions returns None whenever the given name is invalid.
None
Assuming that the (virtual) environment is set up as described in the Appendix below, the main.py enables interaction with the reinforcement learning simulation setup using the program arguments.
main.py
To enumerate available program arguments, run:
python3 main.py --help
For initial training with visualization enabled, run:
python3 main.py
checkpoint-N.msh
checkpoint-N.metadata
For running the training headless (without visualization during evaluation), run:
python3 main.py --render=""
For observing already trained agent policy located in path/to/agent.msh, run:
path/to/agent.msh
python3 main.py --load_agent="true" --agent_path_to_load="path/to/agent.msh" --render="true" --n_epochs=0
To further train selected agent policy located in path/to/agent.msh, run:
python3 main.py --load_agent="true" --agent_path_to_load="path/to/agent.msh" --render=""
The proposed policy and the absorption state classification (evaluator.py) and the trained agent's policy (agent.msh) must be submitted. The trained agent's policy is run for 30 seconds by the evaluation system. The distance travelled by a backmost servomotor (servo-3) is measured and averaged over the last ten simulation steps to determine the distance travelled by the robot.
brute_points_for_forward
brute_points_for_distance
brute_points_for_touching
The abovementioned points are summarized, and up to 5 points are assigned (denoted brute_total_points in the metadata files) to the simulation run. Ten simulation runs are executed, and the median of the points achieved is assigned as the final score.
brute_total_points
Aside from the policy evaluation, the student can provide the report.pdf to summarize the suggested approach and report on its performance. The report is awarded by up to a single point.
python3 --version
sudo apt update sudo sudo add-apt-repository ppa:deadsnakes/ppa -y sudo apt install python3.10-full -y
sudo apt install python3-pip -y pip3 install virtualenv --upgrade virtualenv inchworm_rl_venv --python=python3.10
source inchworm_rl_venv/bin/activate && pip3 install -r requirements.txt
The previously mentioned steps are summarized in the provided install-venv.sh.
install-venv.sh
To familiarize yourself with the simulator setup, it is recommended that you use the MuJoCo simulator outside the reinforcement learning pipeline by following these steps.
bin/simulate.sh
inchworm.xml
model
Then you are free to
4
1
Note that visual elements are purely visual and play no role during training.
Examine the robot part names under Rendering tab in the left column.
is_touching
The inchworm locomotion aims to be as fast as possible. Hence, the forward motion should be promoted, for example, using the average forward velocity on the forward x-axis direction of the first and last servomotors (servo-0 and servo-3) as a part of the reward function.
A desired locomotion uses only scales and bumpers when moving forward. Hence, any other robot parts in contact with the ground should be penalized, for example, using the fixed penalty and is_touching interface function. Moreover, when a state approaches the undesired states, an additional proportional penalty can be applied, for example, when the joint-1 or joint-2 gets closer to the zero angle from the respective (positive or negative) direction.
It can be observed in hand-tuned gaits that scales and bumpers in approximately horizontal configuration are not moving forward as they are expected to anchor the robot in place. Hence, their movement in the approximately horizontal position should be penalized, for example, proportionally to the forward speed.
It can also be observed that the bumper, scales, and respective servo move forward whenever they are significantly rotated (in other words, when only the bumper is touching the ground). Hence, this forward movement, when rotated, should be rewarded.
Finally, in a well-performing gait, the transition between approximately horizontal and significantly rotated scales and bumpers orientations should be as fast as possible to avoid wasting time transitioning between them. Hence, the reward (or penalty) should reflect the joint-0 and joint-3 speed when the respective bumper/scales are outside of either state to promote prompt transition.
The inchworm robot states expected to be the absorption state can be selected based on a significant deviation from the expected robot configuration during its motion, such as