reinforcement_learning/A3C at master · yrlu/reinforcement_learning

README.md

Following paper: Asynchronous Methods for Deep Reinforcement Learning (https://arxiv.org/pdf/1602.01783.pdf)

$ python cartpole_a3c.py --device=cpu --episodes=1000 --workers=4 --log_dir=cartpole_logs

The following graph shows the episode rewards (# workers: 4, entropy loss: 0.2)

Tensorboard:

$ tensorboard --logdir=cartpole_logs/

$ python acrobot_a3c.py --device=cpu --episodes=500 --workers=4 --log_dir=acrobot_logs

The following graph shows the episode rewards (# workers: 4, entropy loss: 0.2)

$ python mountaincar_a3c.py --device=cpu --episodes=20000 --workers=8 --log_dir=mc_logs

The following graph shows the episode rewards (# workers: 8, entropy loss: 1.0, tmax=5)