Need help with plan2explore?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

148 Stars 16 Forks Apache License 2.0 9 Commits 3 Opened issues


Repository for the paper "Planning to Explore via Self-Supervised World Models"

Services available


Need anything else?

Contributors list

# 17,967
2 commits

Planning to Explore via Self-Supervised World Models

[Project Website] [Demo Video] [Long Talk] [TF2 version]

Ramanan Sekar*1, Oleh Rybkin*1, Kostas Daniilidis1, Pieter Abbeel2, Danijar Hafner3,4, Deepak Pathak5,6
(* equal contribution)

1University of Pennsylvania 2UC Berkeley 3Google Research, Brain Team 4University of Toronto 5Carnegie Mellon University 6Facebook AI Research

This is a TensorFlow based implementation for our paper on planning to explore via self-supervised world models. This work focuses on self-supervised exploration, where an agent explores a visual environment without yet knowing the tasks it will later be asked to solve. While current methods often learn reactive exploration behaviors to maximize retrospective novelty, we learn a world model trained from images to plan for expected surprise. Novelty is estimated as ensemble disagreement in the latent space of the world model. Exploring and learning the world model without rewards, our approach, Plan2Explore, efficiently adapts to a range of control tasks with high-dimensional image inputs. If you find this work useful in your research, please cite:

    title={Planning to Explore
    via Self-Supervised World Models},
    author={Ramanan Sekar and Oleh Rybkin
    and Kostas Daniilidis and Pieter Abbeel
    and Danijar Hafner and Deepak Pathak},

TF2 implementation

Please note that a TensorFlow 2 implementation on the base of Dreamer V2 is now available here. To replicate zero-shot results in the TF2 implementation, run

python --logdir logdir/walker_walk/zero_shot --configs defaults dmc --task dmc_walker_walk --expl_behavior plan2explore --expl_until 4e6 --steps 4e6 --grad_heads: 'image'

To replicate few-shot results in the TF2 implementation, run

python --logdir logdir/walker_walk/zero_shot --configs defaults dmc --task dmc_walker_walk --expl_behavior plan2explore --expl_until 1e6 --steps 1.1e6 --grad_heads: 'image'

TF1 implementation (this repo)

Setting up repository

  git clone
  cd plan2explore/

python3.6 -m venv Plan2Explore source $PWD/Plan2Explore/bin/activate pip install --upgrade pip


  • CUDNN-7.6, CUDA-9.0, Python-3.6, Tensorflow 1.14.0, Tensorflow Probability 0.7.0, DeepMind Control Suite (

    rendering option recommended), gym, imageio, matplotlib, ruamel.yaml, scikit-image, scipy.
  • Mujoco-200: Download binaries, put license file inside and add path to .bash_env

  • Run the following command to have the necessary dependencies on the OS:

    apt-get update && apt-get install -y --no-install-recommends \
    build-essential nano libssl-dev libffi-dev libxml2-dev libxslt1-dev\
    zlib1g-dev python3-setuptools python3-pip libglew2.0 libgl1-mesa-glx\
    libopenmpi-dev libgl1-mesa-dev libosmesa6 libglfw3 patchelf xserver-xorg-dev xpra
  • Quick setup for exact replication (Recommended):

    pip install -r requirements.txt
  • The code was tested under Ubuntu 18.

Run code

To train an agent, install the dependencies and then run one of these commands. The commands below all run the default settings of the experiments reported in the paper. Change the task in

--params {tasks:...}
as required. The available tasks are given in
  • Our Plan2Explore Agent Zero-shot Experiments:

    python3 -m plan2explore.scripts.train --expID 1001_walker_walk_plan2explore_zeroshot \
    --params {defaults: [disagree], tasks: [walker_walk]}
  • Random Zero-shot Experiments:

  python3 -m plan2explore.scripts.train --expID 1002_walker_walk_random_zeroshot \
  --params {defaults: [random], tasks: [walker_walk]}
  python3 -m plan2explore.scripts.train --expID 1003_walker_walk_curious_zeroshot \
  --params {defaults: [prediction_error], tasks: [walker_walk]}
  • Supervised Oracle (Dreamer) Experiments:
  python3 -m plan2explore.scripts.train --expID 1004_walker_walk_dreamer \
   --params {defaults: [dreamer], tasks: [walker_walk]}
  • MAX Zero-shot Experiments:

    python3 -m plan2explore.scripts.train --expID 1005_walker_walk_max_zeroshot \
    --params {defaults: [disagree], tasks: [walker_walk], use_max_objective: True}
  • Retrospective Agent Zero-shot Experiments:

    python3 -m plan2explore.scripts.train --expID 1006_walker_walk_retrospective_zeroshot \
    --params {defaults: [disagree], tasks: [walker_walk], exploration_imagination_horizon: 1, curious_action_bootstrap: False, curious_value_bootstrap: False}
  • Our Plan2Explore Agent Few-shot Adaptation Experiments (note: you can use the same command setup as the Zero-shot experiments above with the specific adaptation flags as given here for running the adaptation experiments for other agents):

    python3 -m plan2explore.scripts.train --expID 1007_walker_walk_plan2explore_adapt \
    --params {defaults: [disagree], tasks: [walker_walk], adaptation: True, adaptation_step: 5e6, max_steps: 5.75e6}


These are good places to start when modifying the code:

| Directory | Description | | :-------- | :---------- | |

| Add new parameters or change defaults. | |
| Add or modify environments. | |
| Modify Objectives or Optimization Processes | |
| Add or modify latent transition models. | |
| Add or modify encoder, decoder, or one-step models | |
| Change MPC Agents, add new wrappers, modify simulations |

The available tasks are listed in

. The hyper-parameters can be found in
. The possible configurations for main experiment defaults are
disagree [or] random [or] dreamer [or] prediction_error
. To get started, some quick hyper-parameters for playing around with Plan2Explore are

This codebase was built on top of Dreamer.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.