Github url


by phillipi

phillipi /pix2pix

Image-to-image translation with conditional adversarial nets

7.8K Stars 1.4K Forks Last release: Not found Other 129 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:


Project | Arxiv |PyTorch

Torch implementation for learning a mapping from input images to output images, for example:

Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
CVPR, 2017.

On some tasks, decent results can be obtained fairly quickly and on small datasets. For example, to learn to generate facades (example shown above), we trained on just 400 images for about 2 hours (on a single Pascal Titan X GPU). However, for harder problems it may be important to train on far larger datasets, and for many hours or even days.

Note: Please check out our PyTorch implementation for pix2pix and CycleGAN. The PyTorch version is under active development and can produce results comparable to or better than this Torch version.



  • Linux or OSX
  • NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested)

Getting Started

bash luarocks install nngraph luarocks install
  • Clone this repo:
    bash git clone [email protected]:phillipi/pix2pix.git cd pix2pix
  • Download the dataset (e.g., CMP Facades):
    bash bash ./datasets/download\ facades
  • Train the model
    bash DATA\_ROOT=./datasets/facades name=facades\_generation which\_direction=BtoA th train.lua
  • (CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables
    gpu=0 cudnn=0
    forces CPU only
    bash DATA\_ROOT=./datasets/facades name=facades\_generation which\_direction=BtoA gpu=0 cudnn=0 batchSize=10 save\_epoch\_freq=5 th train.lua

(Optionally) start the display server to view results as the model trains. ( See Display UI for more details):

bash th -ldisplay.start 8000
  • Finally, test the model:

bash DATA\_ROOT=./datasets/facades name=facades\_generation which\_direction=BtoA phase=val th test.lua

The test results will be saved to an html file here:




DATA\_ROOT=/path/to/data/ name=expt\_name which\_direction=AtoB th train.lua





to train translation in opposite direction.

Models are saved to


(can be changed by passing


in train.lua).



in train.lua for additional training options.


DATA\_ROOT=/path/to/data/ name=expt\_name which\_direction=AtoB phase=val th test.lua

This will run the model named


in direction


on all images in



Result images, and a webpage to view them, are saved to


(can be changed by passing


in test.lua).



in test.lua for additional testing options.


Download the datasets using the following script. Some of the datasets are collected by other researchers. Please cite their papers if you use the data.

bash bash ./datasets/download\ dataset\_name

Download the pre-trained models with the following script. You need to rename the model (e.g.,




) after the download has finished.

bash bash ./models/download\ model\_name
  • facades\_label2image
    (label -> facade): trained on the CMP Facades dataset. -
    (label -> street scene): trained on the Cityscapes dataset. -
    (street scene -> label): trained on the Cityscapes dataset. -
    (edge -> photo): trained on UT Zappos50K dataset. -
    (edge -> photo): trained on Amazon handbags images. -
    (daytime scene -> nighttime scene): trained on around 100 webcams.

    Setup Training and Test data

Generating Pairs

We provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A:

Create folder


with subfolders








should each have their own subfolders






, etc. In


, put training images in style A. In


, put the corresponding images in style B. Repeat same for other data splits (




, etc).

Corresponding images in a pair {A,B} must be the same size and have the same filename, e.g.,


is considered to correspond to



Once the data is formatted this way, call:

bash python scripts/combine\_A\_and\ --fold\_A /path/to/data/A --fold\_B /path/to/data/B --fold\_AB /path/to/data

This will combine each pair of images (A,B) into a single image file, ready for training.

Notes on Colorization

No need to run


for colorization. Instead, you need to prepare some natural images and set


in the script. The program will automatically convert each RGB image into Lab color space, and create

L -\> ab

image pair during the training. Also set





Extracting Edges

We provide python and Matlab scripts to extract coarse edges from photos. Run


to compute HED edges. Run


to simplify edges with additional post-processing steps. Check the code documentation for more details.

Evaluating Labels2Photos on Cityscapes

We provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes validation set. We assume that you have installed




) in your system. If not, see the official website for installation instructions. Once


is successfully installed, download the pre-trained FCN-8s semantic segmentation model (512MB) by running

bash bash ./scripts/eval\_cityscapes/download\

Then make sure


is in your system's python path. If not, run the following command to add it

bash export PYTHONPATH=${PYTHONPATH}:./scripts/eval\_cityscapes/

Now you can run the following command to evaluate your predictions:

bash python ./scripts/eval\_cityscapes/ --cityscapes\_dir /path/to/original/cityscapes/dataset/ --result\_dir /path/to/your/predictions/ --output\_dir /path/to/output/directory/

Images stored under


should contain your model predictions on the Cityscapes validation split, and have the original Cityscapes naming convention (e.g.,


). The script will output a text file under


containing the metric.

Further notes: Our pre-trained FCN model is not supposed to work on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are then upsampled to 1024x2048 during training. The purpose of the resizing during training was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need to change the standard FCN training code and the architecture for Cityscapes. During test time, you need to synthesize 256x256 results. Our test code will automatically upsample your results to 1024x2048 before feeding them to the pre-trained FCN model. The output is at 1024x2048 resolution and will be compared to 1024x2048 ground truth labels. You do not need to resize the ground truth labels. The best way to verify whether everything is correct is to reproduce the numbers for real images in the paper first. To achieve it, you need to resize the original/real Cityscapes images (not labels) to 256x256 and feed them to the evaluation code.

Display UI

Optionally, for displaying images during training and test, use the display package.

  • Install it with:
    luarocks install
  • Then start the server with:
    th -ldisplay.start
  • Open this URL in your browser: http://localhost:8000

By default, the server listens on localhost. Pass

to allow external connections on any interface:

bash th -ldisplay.start 8000

Then open


in your browser to load the remote desktop.

L1 error is plotted to the display by default. Set the environment variable


to a comma-separated list of values






to visualize the L1, generator, and discriminator error respectively. For example, to plot only the generator and discriminator errors to the display instead of the default L1 error, set




If you use this code for your research, please cite our paper Image-to-Image Translation Using Conditional Adversarial Networks:

@article{pix2pix2017, title={Image-to-Image Translation with Conditional Adversarial Networks}, author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A}, journal={CVPR}, year={2017} }

Cat Paper Collection

If you love cats, and love reading cool graphics, vision, and learning papers, please check out the Cat Paper Collection:
[Github] [Webpage]


Code borrows heavily from DCGAN. The data loader is modified from DCGAN and Context-Encoder.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.