Need help with Human-Segmentation-PyTorch?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

302 Stars 69 Forks 90 Commits 12 Opened issues


Human segmentation models, training/inference code, and trained weights, implemented in PyTorch

Services available


Need anything else?

Contributors list

# 115,763
76 commits


Human segmentation models, training/inference code, and trained weights, implemented in PyTorch.

Supported networks

To assess architecture, memory, forward time (in either cpu or gpu), numper of parameters, and number of FLOPs of a network, use this command:



Portrait Segmentation (Human/Background) * Automatic Portrait Segmentation for Image Stylization: 1800 images * Supervisely Person: 5711 images


  • Python3.6.x is used in this repository.
  • Clone the repository:
    git clone --recursive
    cd Human-Segmentation-PyTorch
    git submodule sync
    git submodule update --init --recursive
  • To install required packages, use pip:
    workon humanseg
    pip install -r requirements.txt
    pip install -e models/pytorch-image-models


  • For training a network from scratch, for example DeepLab3+, use this command:
    python --config config/config_DeepLab.json --device 0
    where config/config_DeepLab.json is the configuration file which contains network, dataloader, optimizer, losses, metrics, and visualization configurations.
  • For resuming training the network from a checkpoint, use this command:
    python --config config/config_DeepLab.json --device 0 --resume path_to_checkpoint/model_best.pth
  • One can open tensorboard to monitor the training progress by enabling the visualization mode in the configuration file.


There are two modes of inference: video and webcam.

python --watch --use_cuda --checkpoint path_to_checkpoint/model_best.pth
python --use_cuda --checkpoint path_to_checkpoint/model_best.pth


  • Networks are trained on a combined dataset from the two mentioned datasets above. There are 6627 training and 737 testing images.
  • Input size of model is set to 320.
  • The CPU and GPU time is the averaged inference time of 10 runs (there are also 10 warm-up runs before measuring) with batch size 1.
  • The mIoU is measured on the testing subset (737 images) from the combined dataset.
  • Hardware configuration for benchmarking:
    CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
    GPU: GeForce GTX 1050 Mobile, CUDA 9.0

| Model | Parameters | FLOPs | CPU time | GPU time | mIoU | |:-:|:-:|:-:|:-:|:-:|:-:| | UNet_MobileNetV2 (alpha=1.0, expansion=6) | 4.7M | 1.3G | 167ms | 17ms | 91.37% | | UNet_ResNet18 | 16.6M | 9.1G | 165ms | 21ms | 90.09% | | DeepLab3+_ResNet18 | 16.6M | 9.1G | 133ms | 28ms | 91.21% | | BiSeNet_ResNet18 | 11.9M | 4.7G | 88ms | 10ms | 87.02% | | PSPNet_ResNet18 | 12.6M | 20.7G | 235ms | 666ms | --- | | ICNet_ResNet18 | 11.6M | 2.0G | 48ms | 55ms | 86.27% |

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.