Deep Networks with Stochastic Depth
This repository hosts the Torch 7 code for the paper Deep Networks with Stochastic Depth available at http://arxiv.org/abs/1603.09382. For now, the code reproduces the results in Figure 3 for CIFAR-10 and CIFAR-100, and Figure 4 left for SVHN. The code for the 1202-layer network is easily modified from the repo
fb.resnet.torchusing our provided module for stochastic depth.
Please see the latest implementation of stochastic depth and other cool models (DenseNet etc.) in PyTorch, by Felix Wu and Danlu Chen. Their code is much more memory efficient, more user friendly and better maintained. The 1202-layer architecture on CIFAR-10 can be trained on one TITAN X (amazingly!) under our standard settings.
luarocks install nninitshould do the trick.
git clone https://github.com/yueatsprograms/Stochastic_Depth cd Stochastic_Depth git clone https://github.com/soumith/cifar.torch cd cifar.torch th Cifar10BinToTensor.lua cd .. mkdir results th main.lua -dataRoot cifar.torch/ -resultFolder results/ -deathRate 0.5
th main.lua -dataRoot path_to_data -resultFolder path_to_save -deathRate 0.5
-deviceflag allows you to specify which GPU to run on. On our machine with a TITAN X, each epoch takes about 60 seconds, and the program ends with a test error (selected by best validation error) of 5.25%.
The default deathRate is set to 0. This is equivalent to a constant depth network, so to run our baseline, enter:
th main.lua -dataRoot path_to_data -resultFolder path_to_save
You can run on CIFAR-100 by adding the flag
-dataset cifar100. Our program provides other options, for example, your network depth (
-N), data augmentation (
-augmentation), batch size (
-batchSize) etc. You can change the optimization hyperparameters in the sgdState variable, and learning rate schedule in the the main function. The program saves a file every epoch to
deathRate, which has a table of tuples containing your test and validation errors until that epoch.
The architecture and number of epochs for SVHN used in our paper are slightly different from the code's default, please use the following command if you would like to replicate our result of 1.75% on SVHN:
th main.lua -dataRoot path_to_data -resultFolder path_to_save -dataset svhn -N 25 -maxEpochs 50 -deathRate 0.5
model:add(cudnn.SpatialBatchNormalization(_dim_):init('weight', nninit.normal, 1.0, 0.002):init('bias', nninit.constant, 0)). We could not replicate the non-convergence and thus won't put this initialization into our code, but recognize that machines (or the versions of Torch installed) might be different.
My email is ys646 at cornell.edu. I'm happy to answer any of your questions, and I'd very much appreciate your suggestions. My academic website is at http://yueatsprograms.github.io.