Table of Contents

Reinforcement learning labs

We will implement the A2C algorithm for balancing the cartpole system. The gym-like continuous-cartpole environment, which is part of the template, provides rewards for keeping the pendulum in an upward position. The interface corresponds to the usual gym environment:

next_state, reward, done, info = env.step(actions)

The trajectory ends (done==True) when the pendulum deviates more than 15 degrees from the upward position.

Install gym and pyglet

pip3 install gym –user

pip3 install pyglet –user

Download and unpack template

rl_labs_student_template.zip

Implement A2C