Continuous-Time Policy Gradients (CTPG)

Here lives the source code for "Faster Policy Learning with Continuous-Time Gradients" by Samuel Ainsworth, Kendall Lowrey, John Thickstun, Zaid Harchaoui and Siddhartha Srinivasa presented at Learning for Dynamics and Control (L4DC) 2021.

Have you ever wondered what would happen if you took deep reinforcement learning and stripped away as much stochasticity as possible from the policy gradient estimators? Well, wonder no more!

Usage

Much of the code was written against Julia version 1.5.1. The MuJoCo related experiments will also require access to a MuJoCo installation. DiffTaichi experiments require access to the DiffTaichi 0.7.12 differentiable simulator. This should be installed automatically by running ] build in this Julia project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Continuous-Time Policy Gradients (CTPG)

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Continuous-Time Policy Gradients (CTPG)

Usage