Continuous-Time Policy Gradients (CTPG)

Here lives the source code for "Faster Policy Learning with Continuous-Time Gradients" by Samuel Ainsworth, Kendall Lowrey, John Thickstun, Zaid Harchaoui and Siddhartha Srinivasa presented at Learning for Dynamics and Control (L4DC) 2021.

Have you ever wondered what would happen if you took deep reinforcement learning and stripped away as much stochasticity as possible from the policy gradient estimators? Well, wonder no more!

Usage

Much of the code was written against Julia version 1.5.1. The MuJoCo related experiments will also require access to a MuJoCo installation. DiffTaichi experiments require access to the DiffTaichi 0.7.12 differentiable simulator. This should be installed automatically by running ] build in this Julia project.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
deps		deps
diffdrive		diffdrive
difftaichi		difftaichi
lqr		lqr
mujoco		mujoco
pendulum		pendulum
quadrotor		quadrotor
src		src
.gitignore		.gitignore
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
adjoint_test.jl		adjoint_test.jl
hyperband_test.jl		hyperband_test.jl
illustration.png		illustration.png
ppg.jl		ppg.jl
shell.nix		shell.nix
utils.jl		utils.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous-Time Policy Gradients (CTPG)

Usage

About

Releases

Packages

Languages

samuela/ctpg

Folders and files

Latest commit

History

Repository files navigation

Continuous-Time Policy Gradients (CTPG)

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages