====== Reinforcement learning labs ====== We will implement the A2C algorithm for balancing the cartpole system. The gym-like continuous-cartpole environment, which is part of the template, provides rewards for keeping the pendulum in an upward position. The interface corresponds to the usual gym environment: next_state, reward, done, info = env.step(actions) The trajectory ends (done==True) when the pendulum deviates more than 15 degrees from the upward position. {{:courses:b3b33vir:tutorials:cartpole_cont.gif?600|}} ===== Install gym and pyglet ===== pip3 install gym --user pip3 install pyglet --user ===== Download and unpack template ===== {{ :courses:b3b33vir:tutorials:rl_labs_student_template.zip |}} ===== Implement A2C ===== {{:courses:b3b33vir:tutorials:a2c_summary.png?800|}}