Need help with ddpg?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

rmst
197 Stars 59 Forks MIT License 32 Commits 4 Opened issues

Description

Implementation of Deep Deterministic Policy Gradients using TensorFlow and OpenAI Gym

Services available

!
?

Need anything else?

Contributors list

# 87,581
CSS
HTML
theano
Tensorf...
29 commits

Deep Deterministic Policy Gradient

Warning: This repo is no longer maintained. For a more recent (and improved) implementation of DDPG see https://github.com/openai/baselines/tree/master/baselines/ddpg .

Paper: "Continuous control with deep reinforcement learning" - TP Lillicrap, JJ Hunt et al., 2015

Installation

Install Gym and TensorFlow. Then:

pip install pyglet # required for gym rendering
pip install jupyter # required only for visualization (see below)

git clone https://github.com/SimonRamstedt/ddpg.git # get ddpg

Usage

Example:

bash
python run.py --outdir ../ddpg-results/experiment1 --env InvertedDoublePendulum-v1
Enter
python run.py -h
to get a complete overview.

If you want to run in the cloud or a university cluster this might contain additional information.

Visualization

Example:

bash
python dashboard.py --exdir ../ddpg-results/+
Enter
python dashboard.py -h
to get a complete overview.

Known issues

  • No batch normalization yet
  • No conv nets yet (i.e. only learning from low dimensional states)
  • No proper seeding for reproducibilty

Please write me or open a github issue if you encounter problems! Contributions are welcome!

Improvements beyond the original paper

  • Output normalization – the main reason for divergence are variations in return scales. Output normalization would probably solve this.
  • Prioritized experience replay – faster learning, better performance especially with sparse rewards – Please write if you have/know of an implementation!

Advaned Usage

Remote execution:

bash
python run.py --outdir [email protected]:/some/remote/directory/+ --env InvertedDoublePendulum-v1

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.