Search
We will implement the A2C algorithm for balancing the cartpole system. The gym-like continuous-cartpole environment, which is part of the template, provides rewards for keeping the pendulum in an upward position. The interface corresponds to the usual gym environment:
next_state, reward, done, info = env.step(actions)
The trajectory ends (done==True) when the pendulum deviates more than 15 degrees from the upward position.
pip3 install gym –user
pip3 install pyglet –user
rl_labs_student_template.zip