by lingtengqiu

lingtengqiu / Deeperlab-pytorch

Segmentation realize Deeperlab only segmentation part

126 Stars 25 Forks Last release: Not found 15 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:


This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch.

demo image


  • Distributed Training: >60% Thank you ycszen, from his struct faster than the multi-thread parallel method(nn.DataParallel), we use the multi-processing parallel method.
  • Multi-GPU training and inference: support different manners of inference.
  • Provides pre-trained models and implement different semantic segmentation models.


  • PyTorch 1.0
    • pip3 install torch torchvision
  • Easydict
    • pip3 install easydict
  • Apex
  • Ninja
    • sudo apt-get install ninja-build
  • tqdm
    • pip3 install tqdm

Pretrain Model

Model Zoo

Supported Model

  • deeperlab(CVPR2019)
    deeperlab image
    ### Performance and Benchmarks SS:Single Scale MSF:Multi-scale + Flip

PASCAL VOC 2012(SBD and Never SBD)

because I only realize the segmentation part,I tested its results on voc Method | Backbone | TrainSet| EvalSet | Mean IoU(ss) | Mean IoU(msf) :--:|:--:|:--:|:--:|:--:|:--: deeperlab(ours+SBD) | R101v1c | *trainaug* | val | 79.71 | 80.26 deeperlab(ours) | R101v1c | *trainaug* | val | 73.28 | 74.11

To Do

  • [ ] Detection part
    ## Link we must build the env for training
    make link
    make others
    soft link to data,pretrain,log,logger


  1. create the config file of dataset:

    file structure:(split with
    path-of-the-image   path-of-the-groundtruth
  2. modify the
    according to your requirements
  3. train a network:

Distributed Training

We use the official

in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU.

For each experiment, you can just run this script:

export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS

Non-distributed Training

The above performance are all conducted based on the non-distributed training. For each experiment, you can just run this script:


In, the argument of

means the GPU you want to use.


In the evaluator, we have implemented the multi-gpu inference base on the multi-process. In the inference phase, the function will spawns as many Python processes as the number of GPUs we want to use, and each Python process will handle a subset of the whole evaluation dataset on a single GPU. 1. evaluate a trained network on the validation set:

2. input arguments in shell:
    usage: -e epoch_idx -d device_idx -c save_csv [--verbose ] 
    [--show_image] [--save_path Pred_Save_Path]


if you are interested my algorithm, you can see my realized segmentation tool(dfn,deeperlab,deeplabv3 plus and so on):
- segmentation-torch

Be Care for

because my device is 1080, we can't use 7*7 conv in two 4096 channel due to out of memory. so if you use it. you can change it in model/

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.