This is an implementation of PPO for CartPole-v1 from the OpenAI gym enviorment. The algorithm used is based off these papers: High-Dimensional Continuous Control Using Generalized Advantage Estimation and Proximal Policy Optimization Algorithms.
Output:
Episode: 25 Score: 28.4
Episode: 50 Score: 46.72
Episode: 75 Score: 173.72
Episode: 100 Score: 352.88
Episode: 125 Score: 461.24
Episode: 150 Score: 486.24
Episode: 175 Score: 492.8
Episode: 200 Score: 442.0
Episode: 225 Score: 485.64
Episode: 250 Score: 490.48
Episode: 275 Score: 491.08
Episode: 300 Score: 500.0
Solved!
Solved here is defined as 25 episodes in a row wih a score of 500.0.
Dependencies needed are numpy, tensorflow, and gym which can be installed via pip. To run: python PPO.py
.