dc-ign

by willwhitney

willwhitney / dc-ign

The Deep Convolutional Inverse Graphics Network

211 Stars 48 Forks Last release: Not found 126 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Deep Convolutional Inverse Graphics Network

This repository contains the code for the network described in http://arxiv.org/abs/1503.03167.

Use Cases:

  • Unsupervised Feature Learning
  • Neural 3D graphics engine: Given a static face image, our model can re-render (hallucinate) the face with arbitrary light and viewpoint transformations. Below is a sample movie generated by our model from a single face photograph -- this is achieved by varying the light neuron and obtaining the image frame prediction at each time step. Same can be done for pose variations (see paper or project website)

A DC-IGN lighting demo

Project Website: http://willwhitney.github.io/dc-ign/www/

Citation

@article{kulkarni2015deep,
  title={Deep Convolutional Inverse Graphics Network},
  author={Kulkarni, Tejas D and Whitney, Will and Kohli, Pushmeet and Tenenbaum, Joshua B},
  journal={arXiv preprint arXiv:1503.03167},
  year={2015}
}

Running

Requirements

Facebook has some great instructions for installing these over at https://github.com/facebook/fbcunn/blob/master/INSTALL.md

Instructions

Dataset and pre-trained network: The train/test dataset can be downloaded from Dropbox or Amazon S3.

A pretrained network is also available if you just want to see the results: Dropbox, Amazon S3

Update 06/23/16: We've been getting a bunch of traffic due to the (highly recommended!) InfoGAN paper, so I've mirrored the files on S3. If neither Dropbox nor S3 works, please email me ([email protected]) and I'll get it to you another way.

Training a network with separated pose/light/shape etc (disentangled representations)

  1. git clone
    this repo
  2. Download the dataset and unzip it.
  3. Grab a coffee while you wait for that to happen. It's pretty big.
  4. Run
    th monovariant_main.lua --help
    to see the available options.
  5. To train from scratch:
    1. run something like
      th monovariant_main.lua --no_load --name my_first_dcign --datasetdir 
    2. [The network will save itself to
      networks/
      after each epoch]
    3. After a couple of epochs, open up
      visualize_networks.lua
      and set
      network_search_str
      to your network's name. Then you can run
      th visualize_networks.lua
      and it will create a folder called
      renderings
      with some visualizations of the kinds of faces your network generates.
  6. To use a pretrained network:
    1. Download the pretrained network and unzip it.
    2. More coffee while you wait.
    3. Run a command like
      th monovariant_main.lua --import  --name my_first_dcign --datasetdir 
      that imports the directory of that pretrained net.
    4. Or, just do the
      visualize_networks
      thing from above with the pretrained network to see what it makes.
  7. The default will run on CPU, to enable cuda please do following
    1. --useCuda --deviceId deviceToUse
      : Default deviceId is
      1
      .
    2. For cudnn use
      --useCuda --useCudnn --deviceId deviceToUse
      .

Training a network with undifferentiated latents

Instructions coming soon, but if you're not afraid of code that hasn't been cleaned up yet, check out

main.lua
.

Paper abstract

This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN) that aims to learn an interpretable representation of images that is disentangled with respect to various transformations such as object out-of-plane rotations, lighting variations, and texture. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. We propose training procedures to encourage neurons in the graphics code layer to have semantic meaning and force each group to distinctly represent a specific transformation (pose, light, texture, shape etc.). Given a static face image, our model can re-generate the input image with different pose, lighting or even texture and shape variations from the base face. We present qualitative and quantitative results of the model’s efficacy to learn a 3D rendering engine. Moreover, we also utilize the learnt representation for two important visual recognition tasks: (1) an invariant face recognition task and (2) using the representation as a summary statistic for generative modeling.

Acknowledgements

A big shout-out to all the Torch developers. Torch is simply awesome. We thank Thomas Vetter for giving us access to the Basel face model. T. Kulkarni was graciously supported by the Leventhal Fellowship. This research was supported by ONR award N000141310333, ARO MURI W911NF-13-1-2012 and CBMM. We would also like to thank (y0ast) https://github.com/y0ast for making the variational autoencoder code available online.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.