Need help with Pytorch_Realtime_Multi-Person_Pose_Estimation?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

207 Stars 69 Forks MIT License 103 Commits 30 Opened issues


Pytorch version of Realtime Multi-Person Pose Estimation project

Services available


Need anything else?

Contributors list

# 402,358
4 commits

Pytorch Realtime Multi-Person Pose Estimation

This is a pytroch version of Realtime Multi-Person Pose Estimation, origin code is here


Code for reproducing CVPR 2017 Oral paper using pytorch



The result is generated by the model, which has trained 30 epoches.


1.preprocessing: some scripts for preprocessing data. some scripts for training networks.

3.testing: the test script and example.

4.caffe2pytorch: the script for converting.

5.caffe_model: caffe model


Pytorch: 0.2.0_3

Caffe: If you want to convert the caffemodel by your own.

Instructions some transformer.

transformer the image, mask, keypoints and center points, together. to read data for network.

It will generate the PAFs vector and heatmap when get the image.

The PAFs vector's format as follow:

    {3,  4},
    {4,  5},
    {6,  7},
    {7,  8},
    {9,  10},
    {10, 11},
    {12, 13},
    {13, 14},
    {1,  2},
    {2,  9},
    {2,  12},
    {2,  3},
    {2,  6},
    {3,  17},
    {6,  18},
    {1,  16},
    {1,  15},
    {16, 17},
    {15, 18},

Where each index is the key value corresponding to each part in POSECOCOBODY_PARTS some common functions, such as adjust learning rate, read configuration and etc.

visualize_input.ipynb: the script to vierfy the validaity of preprocessing and generating heatmap and vectors. It shows some examples. the structure of networks.

The first 10 layers equals to VGG-19, so if set pretrained as True, it will be initialized by the VGG-19. And the stage is 6. The first stage has 5 layers (3 3x3conv + 2 1x1conv) and the remainder stages have 7 layers (5 3x3conv + 2 1x1conv).

TODO: the stage is adjustable.

Training steps

  • Download the data set, annotations and COCO official toolbox
  • Go to the "preprocessing" folder
    cd preprocessing
  • Generate json file and masks
    python generate_json_mask,py
  • Go to the "training" folder
    cd ../training
  • Set the train parameters in "config.yml".
  • Set the train data dir , train mask dir, train json filepath and val data dir, val mask dir, val json filepath.
  • Train the model


  • When you want to train some other datasets, please change the code:, to correspond to your datasets. Besides, please ensure '0' corresponds to background.
  • The converted model and my code are used BGR to train and test images.


Please cite the paper in your publocations if it helps your research:

    title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields}},
    author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2017}


The repo is freely available for free non-commercial use. Please see the license for further details.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.