- REINFORCE (Policy Gradient Monte Carlo)
- Actor Critic with Monte Carlo advantage estimate
- Advantage Actor Critic (A2C)
- batch-norm not working in eval mode
- ideas from atari preprocessing
- normalize input
- optimize for speed
- 4-frames stack
- plot grad dist/grad norm
- plot different losses
- plot more metrics (from shultz presentation)
- mean by time
- remove float casts
- refactor rollout to use s_prime at every step
- normalize input
- use record episode stats
- merge wrappers and transforms
- make layers shared between versions
- check all conv paddings
- 5 step horizon
- use activation for value prediction
- add action to obs
- advantage normalization
- td(0)
- exp replay
- td(lambda)
- mpi
- a3c
- compute running mean/std of metrics
- rename meta to info