PyTorch implementation of DeepLab v2 on COCO-Stuff / PASCAL VOC
This is an unofficial PyTorch implementation of DeepLab v2 [1] with a ResNet-101 backbone. * COCO-Stuff dataset [2] and PASCAL VOC dataset [3] are supported. * The official Caffe weights provided by the authors can be used without building the Caffe APIs. * DeepLab v3/v3+ models with the identical backbone are also included (not tested). *
torch.hubis supported.
Train set | Eval set | Code | Weight | CRF? | Pixel Accuracy |
Mean Accuracy |
Mean IoU | FreqW IoU |
---|---|---|---|---|---|---|---|---|
10k train † | 10k val † | Official [2] | 65.1 | 45.5 | 34.4 | 50.4 | ||
This repo | Download | 65.8 | 45.7 | 34.8 | 51.2 | |||
✓ | 67.1 | 46.4 | 35.6 | 52.5 | ||||
164k train | 164k val | This repo | Download ‡ | 66.8 | 51.2 | 39.1 | 51.5 | |
✓ | 67.6 | 51.5 | 39.7 | 52.3 |
† Images and labels are pre-warped to square-shape 513x513
‡ Note for SPADE followers: The provided COCO-Stuff 164k weight has been kept intact since 2019/02/23.
Train set | Eval set | Code | Weight | CRF? | Pixel Accuracy |
Mean Accuracy |
Mean IoU | FreqW IoU |
---|---|---|---|---|---|---|---|---|
trainaug | val | Official [3] | - | - | 76.35 | - | ||
✓ | - | - | 77.69 | - | ||||
This repo | Download | 94.64 | 86.50 | 76.65 | 90.41 | |||
✓ | 95.04 | 86.64 | 77.93 | 91.06 |
Required Python packages are listed in the Anaconda configuration file
configs/conda_env.yaml. Please modify the listed
cudatoolkit=10.2and
python=3.6as needed and run the following commands.
# Set up with Anaconda conda env create -f configs/conda_env.yaml conda activate deeplab-pytorch
Caffemodels pre-trained on COCO and PASCAL VOC datasets are released by the DeepLab authors. In accordance with the papers [1,2], this repository uses the COCO-trained parameters as initial weights.
$ bash scripts/setup_caffemodels.sh
# Generate "deeplabv1_resnet101-coco.pth" from "init.caffemodel" $ python convert.py --dataset coco # Generate "deeplabv2_resnet101_msc-vocaug.pth" from "train2_iter_20000.caffemodel" $ python convert.py --dataset voc12
To train DeepLab v2 on PASCAL VOC 2012:
python main.py train \ --config-path configs/voc12.yaml
To evaluate the performance on a validation set:
python main.py test \ --config-path configs/voc12.yaml \ --model-path data/models/voc12/deeplabv2_resnet101_msc/train_aug/checkpoint_final.pth
Note: This command saves the predicted logit maps (
.npy) and the scores (
.json).
To re-evaluate with a CRF post-processing:
python main.py crf \ --config-path configs/voc12.yaml
Execution of a series of the above scripts is equivalent to
bash scripts/train_eval.sh.
To monitor a loss, run the following command in a separate terminal.
tensorboard --logdir data/logs
Please specify the appropriate configuration files for the other datasets.
| Dataset | Config file | #Iterations | Classes | | :-------------- | :--------------------------- | :---------- | :--------------------------- | | PASCAL VOC 2012 |
configs/voc12.yaml| 20,000 | 20 foreground + 1 background | | COCO-Stuff 10k |
configs/cocostuff10k.yaml| 20,000 | 182 thing/stuff | | COCO-Stuff 164k |
configs/cocostuff164k.yaml| 100,000 | 182 thing/stuff |
Note: Although the label indices range from 0 to 181 in COCO-Stuff 10k/164k, only 171 classes are supervised.
Common settings:
CUDA_VISIBLE_DEVICES=. - Multi-scale loss: Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and element-wise max across the scales. The unlabeled class is ignored in the loss computation. - Gradient accumulation: The mini-batch of 10 samples is not processed at once due to the high occupancy of GPU memories. Instead, gradients of small batches of 5 samples are accumulated for 2 iterations, and weight updating is performed at the end (
batch_size * iter_size = 10). GPU memory usage is approx. 11.2 GB with the default setting (tested on the single Titan X). You can reduce it with a small
batch_size. - Learning rate: Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of 2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by
(1-iter/iter_max)**powerat every 10 iterations. - Monitoring: Moving average loss (
average_lossin Caffe) can be monitored in TensorBoard. - Preprocessing: Input images are randomly re-scaled by factors ranging from 0.5 to 1.5, padded if needed, and randomly cropped to 321x321.
Processed images and labels in COCO-Stuff 164k:
You can use the pre-trained models, the converted models, or your models.
To process a single image:
python demo.py single \ --config-path configs/voc12.yaml \ --model-path deeplabv2_resnet101_msc-vocaug-20000.pth \ --image-path image.jpg
To run on a webcam:
python demo.py live \ --config-path configs/voc12.yaml \ --model-path deeplabv2_resnet101_msc-vocaug-20000.pth
To run a CRF post-processing, add
--crf. To run on a CPU, add
--cpu.
Model setup with two lines
import torch.hub model = torch.hub.load("kazuto1011/deeplab-pytorch", "deeplabv2_resnet101", pretrained='cocostuff164k', n_classes=182)
Interplayer) for downsampling a label for only 0.5x input, this codebase does for both 0.5x and 0.75x inputs with nearest interpolation (
PIL.Image.resize, related issue).
align_corners=False.
This codebase only supports DeepLab v2 training which freezes batch normalization layers, although v3/v3+ protocols require training them. If training their parameters on multiple GPUs as well in your projects, please install the extra library below.
pip install torch-encoding
Batch normalization layers in a model are automatically switched in
libs/models/resnet.py.
try: from encoding.nn import SyncBatchNorm _BATCH_NORM = SyncBatchNorm except: _BATCH_NORM = nn.BatchNorm2d
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image
Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE TPAMI,
2018.
Project /
Code / arXiv
paper
H. Caesar, J. Uijlings, V. Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR, 2018.
Project / arXiv paper
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL Visual Object
Classes (VOC) Challenge. IJCV, 2010.
Project /
Paper