====== Reinforcement learning labs ======
We will implement the A2C algorithm for balancing the cartpole system. The gym-like continuous-cartpole environment, which is part of the template, provides rewards for keeping the pendulum in an upward position. The interface corresponds to the usual gym environment: 
<code>next_state, reward, done, info = env.step(actions)
</code>

The trajectory ends (done==True) when the pendulum deviates more than 15 degrees from the upward position.
{{:courses:b3b33vir:tutorials:cartpole_cont.gif?600|}}
===== Install gym and pyglet =====

pip3 install gym --user

pip3 install pyglet --user


===== Download and unpack template =====

{{ :courses:b3b33vir:tutorials:rl_labs_student_template.zip |}}

===== Implement A2C =====

{{:courses:b3b33vir:tutorials:a2c_summary.png?800|}}