Need help with mbpo?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

304 Stars 55 Forks MIT License 31 Commits 14 Opened issues


Code for the paper "When to Trust Your Model: Model-Based Policy Optimization"

Services available


Need anything else?

Contributors list

# 293,681
9 commits

Model-Based Policy Optimization

Code to reproduce the experiments in When to Trust Your Model: Model-Based Policy Optimization.


  1. Install MuJoCo 1.50 at
    and copy your license key to
  2. Clone
    git clone --recursive
  3. Create a conda environment and install mbpo
    cd mbpo
    conda env create -f environment/gpu-env.yml
    conda activate mbpo
    pip install -e viskit
    pip install -e .


Configuration files can be found in


mbpo run_local examples.development --config=examples.config.halfcheetah.0 --gpus=1 --trial-gpus=1

Currently only running locally is supported.

New environments

To run on a different environment, you can modify the provided template. You will also need to provide the termination function for the environment in

. If you name the file the lowercase version of the environment name, it will be found automatically. See
for an example.


This codebase contains viskit as a submodule. You can view saved runs with:

viskit ~/ray_mbpo --port 6008
assuming you used the default


The rollout length schedule is defined by a length-4 list in a config file. The format is

[start_epoch, end_epoch, start_length, end_length]
, so the following:
'rollout_schedule': [20, 100, 1, 5] 
corresponds to a model rollout length linearly increasing from 1 to 5 over epochs 20 to 100.

If you want to speed up training in terms of wall clock time (but possibly make the runs less sample-efficient), you can set a timeout for model training (

, in seconds) or train the model less frequently (every

Comparing to MBPO

If you would like to compare to MBPO but do not have the resources to re-run all experiments, the learning curves found in Figure 2 of the paper (plus on the Humanoid environment) are available in this shared folder. See
for an example of how to read the pickle files with the results.


  author = {Michael Janner and Justin Fu and Marvin Zhang and Sergey Levine},
  title = {When to Trust Your Model: Model-Based Policy Optimization},
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2019}


The underlying soft actor-critic implementation in MBPO comes from Tuomas Haarnoja and Kristian Hartikainen's softlearning codebase. The modeling code is a slightly modified version of Kurtland Chua's PETS implementation.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.