Search
Your first homework will be to write a neural network and optimization algorithm which makes a quadruped model move towards a given location. You will not be using any more advanced techniques like gradient descent in this case because the evaluation function is not differentiable with respect to the parameters. Instead you will implement a simple heuristic random search algorithm which will optimize the task. The point of this homework is to familiarize the student with formulating a problem as an optimization task and writing a very basic solver to solve it. You should get an intuition of how such a basic optimization process behaves and how it depends on various parameters.
This is a 'warmup' homework and will be introduced in the labs in the 2nd week. It uses your knowledge of numpy and the very basics of neural networks. Pytorch is not required for this homework but will be used heavily in the following weeks.
Pybullet is a dynamics simulation library in python which uses the well known Bullet physics engine. It's probably the simplest, easiest to install and most flexible library in python which can be used for robotic simulation.
The library and the installation process is described here in a bit more detail.
You are given an environment which defines the quadruped. The environment defines everything that is required to simulate and render the robot. We will use a very similar API as OpenAI gym.
The most important parts are
Our robot environment is defined in a single .py file and the body of the quadruped in a .urdf file. You are encouraged to go through all functions to understand how it works.
PYTHONPATH
hw_1
The quadruped model has 3 joints per leg. The observations are a concatenation of joint angles, torso quaternion and four binary leg contact points. The output is the target joint angle and has a dimension of 12, 1 for each joint. The model has an internal PID for each joint with parameters defined inside the model.
Design and train a neural network using numpy which makes the quadruped walk in the positive x direction with a given speed. You can formulate the quadruped environment as an MDP with state space $S$, action space $A$, transition function $s_{t+1} \sim T(s_t, a_t)$, reward function $r = F(s_t, a_t)$ and horizon $h$. We define a policy function $a_t \sim \pi_{w}(s_t)$ which maps states to actions and is parameterized by a vector $w$. The objective is to find a parameter vector $w$ which maximizes the cumulative reward.
$$ w^* = \underset{w}{\arg\max} \sum_{s_t \sim T, a_t \sim \pi_w} r(s_t, a_t) $$
This is done in episodes. The episode lasts a fixed amount of steps (horizon h) in our case, after which the robot is reset to its initial position.
The method that you will be using to optimize the weights of your algorithm is essentially random search. This can be summarized using the following:
* step 1: Start with initial guess for solution (can be random) * step 2: Perturb solution with some element of stochasticity * step 3: Map solution onto your problem and evaluate it * step 4: If new solution is better than the last best, then new solution is a function of the old and perturbed solution * step 5: If satisfied, then finish, otherwise go to step 2
Submit the whole project file as you download it. The agents folder should contain your trained agent and the PolicyNet class should contain your implementation of the control algorithm, which can be anything, as long as it has learnable parameters. It will be tested in a similar fashion as the test function in blackbox_opt so make sure that your entire script, including the test function runs successfully.
PolicyNet
test
blackbox_opt
hw_1/src/Lab_2/<code here>
The reward function used to evaluate the environment will be the same one as given in the template.
The maximum achievable points for this homework will be 8 points.
Every 24 hours after the deadline, you will lose 1 points. However, you will not gain a negative number of points, so the minimum is 0.