Warning
This page is located in archive. Go to the latest version of this course pages.

Reinforcement learning labs

We will implement the A2C algorithm for balancing the cartpole system. The gym-like continuous-cartpole environment, which is part of the template, provides rewards for keeping the pendulum in an upward position. The interface corresponds to the usual gym environment:

next_state, reward, done, info = env.step(actions)

The trajectory ends (done==True) when the pendulum deviates more than 15 degrees from the upward position.

Install gym and pyglet

pip3 install gym –user

pip3 install pyglet –user

Download and unpack template

Implement A2C

courses/b3b33vir/tutorials/lab_6.txt · Last modified: 2021/12/15 16:13 by zimmerk