Table of Contents

HW 01 - Quadruped locomotion using random search

 Gif

Your first homework will be to write a neural network and optimization algorithm which makes a quadruped model move towards a given location. You will not be using any more advanced techniques like gradient descent in this case because the evaluation function is not differentiable with respect to the parameters. Instead you will implement a simple heuristic random search algorithm which will optimize the task. The point of this homework is to familiarize the student with formulating a problem as an optimization task and writing a very basic solver to solve it. You should get an intuition of how such a basic optimization process behaves and how it depends on various parameters.

This is a 'warmup' homework and will be introduced in the labs in the 3rd week. It uses your knowledge of numpy and the very basics of neural networks. Pytorch is not required for this homework but will be used heavily in the following weeks and the second homework will will be on image segmentation.

PyBullet

Pybullet is a dynamics simulation library in python which uses the well known Bullet physics engine. It's probably the simplest, easiest to install and most flexible library in python which can be used for robotic simulation.

The library and the installation process is described here in a bit more detail.

Quadruped Environment

You are given an environment which defines the quadruped. The environment defines everything that is required to simulate and render the robot. We will use a very similar API as OpenAI gym.

The most important parts are

Our robot environment is defined in a single .py file and the body of the quadruped in a .urdf file. You are encouraged to go through all functions to understand how it works.

If you are opening the project in pycharm make sure that the src folder is on the same level as the .idea directory. In other words, open the project inside the hw_1 folder so that pycharm automatically sets the PYTHONPATH to point inside the hw_1 directory. Otherwise you will get import errors. Another solution is to manually modify the PYTHONPATH.

The quadruped model has 3 joints per leg. The observations are a concatenation of joint angles, torso quaternion and four binary leg contact points. The output is the target joint angle and has a dimension of 12, 1 for each joint. The model has an internal PID for each joint with parameters defined inside the model.

Your task

 MDP env

Design and train a neural network using numpy which makes the quadruped walk in the positive x direction with a given speed. You can formulate the quadruped environment as an MDP with state space $S$, action space $A$, transition function $s_{t+1} \sim T(s_t, a_t)$, reward function $r = F(s_t, a_t)$ and horizon $h$. We define a policy function $a_t \sim \pi_{w}(s_t)$ which maps states to actions and is parameterized by a vector $w$. The objective is to find a parameter vector $w$ which maximizes the cumulative reward.

$$ w^* = \underset{w}{\arg\max} \sum_{s_t \sim T, a_t \sim \pi_w} r(s_t, a_t) $$


This is done in episodes. The episode lasts a fixed amount of steps (horizon h) in our case, after which the robot is reset to its initial position.


You have to implement:

Random search algorithm

The method that you will be using to optimize the weights of your algorithm is essentially random search. This can be summarized using the following:

 
 * step 1: Start with initial guess for solution (can be random)
 * step 2: Perturb solution with some element of stochasticity
 * step 3: Map solution onto your problem and evaluate it
 * step 4: If new solution is better than the last best, then new solution is a function of the old and perturbed solution
 * step 5: If satisfied, then finish, otherwise go to step 2
Almost every stochastic iterative optimization algorithm is a variation on the above. You can make it 'smarter' than just randomly perturbing using the following heuristics (and ones that you come up with yourself).
Note that this approach is called a 'black box' approach and assumes almost nothing about the problem except the size of the solution vector. The solution vector is just a list of numbers and the optimization algorithm doesn't know or care about what each of these numbers represent.

Submission

If you downloaded the template before 10.oct, then before submitting to brute please do the following changes:
  • Delete any pytorch imports that you have in your project. The only image in brute that has pybullet doesn't have pytorch.
  • Do the following change:

 policy_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'agents/quadruped_bbx.npy') 
in blackbox_opt.py

Otherwise just download the new template and copypaste your solution there. The upload in brute is being tested so if it doesn't work as planned, don't be alarmed, just write me an email.

Submit the whole project file as you download it. The agents folder should contain your trained agent and the PolicyNet class should contain your implementation of the control algorithm, which can be anything, as long as it has learnable parameters. It will be tested in a similar fashion as the test function in blackbox_opt so make sure that your entire script, including the test function runs successfully.

When your submission is unzipped it has to have the following structure:
hw_1/src/Lab_2/<code here> 

Points

The reward function used to evaluate the environment will be the same one as given in the template.

The maximum achievable points for this homework will be 8 points.

Deadline

Every 24 hours after the deadline, you will lose 1 points. However, you will not gain a negative number of points, so the minimum is 0.