Need help with deeplab-pytorch?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

kazuto1011
790 Stars 219 Forks MIT License 102 Commits 5 Opened issues

Description

PyTorch implementation of DeepLab v2 on COCO-Stuff / PASCAL VOC

Services available

!
?

Need anything else?

Contributors list

# 40,025
Python
Shell
voc
deeplab
85 commits
# 322,038
Shell
voc
deeplab
pytorch
1 commit

DeepLab with PyTorch

This is an unofficial PyTorch implementation of DeepLab v2 [1] with a ResNet-101 backbone. * COCO-Stuff dataset [2] and PASCAL VOC dataset [3] are supported. * The official Caffe weights provided by the authors can be used without building the Caffe APIs. * DeepLab v3/v3+ models with the identical backbone are also included (not tested). *

torch.hub
is supported.

Performance

COCO-Stuff

Train set Eval set Code Weight CRF? Pixel
Accuracy
Mean
Accuracy
Mean IoU FreqW IoU
10k train 10k val Official [2] 65.1 45.5 34.4 50.4
This repo Download 65.8 45.7 34.8 51.2
67.1 46.4 35.6 52.5
164k train 164k val This repo Download 66.8 51.2 39.1 51.5
67.6 51.5 39.7 52.3

† Images and labels are pre-warped to square-shape 513x513
‡ Note for SPADE followers: The provided COCO-Stuff 164k weight has been kept intact since 2019/02/23.

PASCAL VOC 2012

Train set Eval set Code Weight CRF? Pixel
Accuracy
Mean
Accuracy
Mean IoU FreqW IoU
trainaug val Official [3] - - 76.35 -
- - 77.69 -
This repo Download 94.64 86.50 76.65 90.41
95.04 86.64 77.93 91.06

Setup

Requirements

Required Python packages are listed in the Anaconda configuration file

configs/conda_env.yaml
. Please modify the listed
cudatoolkit=10.2
and
python=3.6
as needed and run the following commands.
# Set up with Anaconda
conda env create -f configs/conda_env.yaml
conda activate deeplab-pytorch

Download datasets

Download pre-trained caffemodels

Caffemodels pre-trained on COCO and PASCAL VOC datasets are released by the DeepLab authors. In accordance with the papers [1,2], this repository uses the COCO-trained parameters as initial weights.

  1. Run the follwing script to download the pre-trained caffemodels (1GB+).
$ bash scripts/setup_caffemodels.sh
  1. Convert the caffemodels to pytorch compatibles. No need to build the Caffe API!
# Generate "deeplabv1_resnet101-coco.pth" from "init.caffemodel"
$ python convert.py --dataset coco
# Generate "deeplabv2_resnet101_msc-vocaug.pth" from "train2_iter_20000.caffemodel"
$ python convert.py --dataset voc12

Training & Evaluation

To train DeepLab v2 on PASCAL VOC 2012:

python main.py train \
    --config-path configs/voc12.yaml

To evaluate the performance on a validation set:

python main.py test \
    --config-path configs/voc12.yaml \
    --model-path data/models/voc12/deeplabv2_resnet101_msc/train_aug/checkpoint_final.pth

Note: This command saves the predicted logit maps (

.npy
) and the scores (
.json
).

To re-evaluate with a CRF post-processing:

python main.py crf \
    --config-path configs/voc12.yaml

Execution of a series of the above scripts is equivalent to

bash scripts/train_eval.sh
.

To monitor a loss, run the following command in a separate terminal.

tensorboard --logdir data/logs

Please specify the appropriate configuration files for the other datasets.

| Dataset | Config file | #Iterations | Classes | | :-------------- | :--------------------------- | :---------- | :--------------------------- | | PASCAL VOC 2012 |

configs/voc12.yaml
| 20,000 | 20 foreground + 1 background | | COCO-Stuff 10k |
configs/cocostuff10k.yaml
| 20,000 | 182 thing/stuff | | COCO-Stuff 164k |
configs/cocostuff164k.yaml
| 100,000 | 182 thing/stuff |

Note: Although the label indices range from 0 to 181 in COCO-Stuff 10k/164k, only 171 classes are supervised.

Common settings:

  • Model: DeepLab v2 with ResNet-101 backbone. Dilated rates of ASPP are (6, 12, 18, 24). Output stride is 8.
  • GPU: All the GPUs visible to the process are used. Please specify the scope with
    CUDA_VISIBLE_DEVICES=
    . - Multi-scale loss: Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and element-wise max across the scales. The unlabeled class is ignored in the loss computation. - Gradient accumulation: The mini-batch of 10 samples is not processed at once due to the high occupancy of GPU memories. Instead, gradients of small batches of 5 samples are accumulated for 2 iterations, and weight updating is performed at the end (
    batch_size * iter_size = 10
    ). GPU memory usage is approx. 11.2 GB with the default setting (tested on the single Titan X). You can reduce it with a small
    batch_size
    . - Learning rate: Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of 2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by
    (1-iter/iter_max)**power
    at every 10 iterations. - Monitoring: Moving average loss (
    average_loss
    in Caffe) can be monitored in TensorBoard. - Preprocessing: Input images are randomly re-scaled by factors ranging from 0.5 to 1.5, padded if needed, and randomly cropped to 321x321.

Processed images and labels in COCO-Stuff 164k:

Data

Inference Demo

You can use the pre-trained models, the converted models, or your models.

To process a single image:

python demo.py single \
    --config-path configs/voc12.yaml \
    --model-path deeplabv2_resnet101_msc-vocaug-20000.pth \
    --image-path image.jpg

To run on a webcam:

python demo.py live \
    --config-path configs/voc12.yaml \
    --model-path deeplabv2_resnet101_msc-vocaug-20000.pth

To run a CRF post-processing, add

--crf
. To run on a CPU, add
--cpu
.

Misc

torch.hub

Model setup with two lines

import torch.hub
model = torch.hub.load("kazuto1011/deeplab-pytorch", "deeplabv2_resnet101", pretrained='cocostuff164k', n_classes=182)

Difference with Caffe version

  • While the official code employs 1/16 bilinear interpolation (
    Interp
    layer) for downsampling a label for only 0.5x input, this codebase does for both 0.5x and 0.75x inputs with nearest interpolation (
    PIL.Image.resize
    , related issue).
  • Bilinear interpolation on images and logits is performed with the
    align_corners=False
    .

Training batch normalization

This codebase only supports DeepLab v2 training which freezes batch normalization layers, although v3/v3+ protocols require training them. If training their parameters on multiple GPUs as well in your projects, please install the extra library below.

pip install torch-encoding

Batch normalization layers in a model are automatically switched in

libs/models/resnet.py
.
try:
    from encoding.nn import SyncBatchNorm
    _BATCH_NORM = SyncBatchNorm
except:
    _BATCH_NORM = nn.BatchNorm2d

References

  1. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE TPAMI, 2018.
    Project / Code / arXiv paper

  2. H. Caesar, J. Uijlings, V. Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR, 2018.
    Project / arXiv paper

  3. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010.
    Project / Paper

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.