Need help with VNect?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

171 Stars 33 Forks Apache License 2.0 107 Commits 0 Opened issues


Real-time 3D human pose estimation, implemented by tensorflow

Services available


Need anything else?

Contributors list


A tensorflow implementation of VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera.

For the caffe model/weights required in the repository: please contact the author of the paper.


  • Python 3.x
  • tensorflow-gpu 1.x
  • pycaffe


Fedora 29

Install python dependencies:

pip3 install -r requirements.txt --user

Install caffe dependencies

sudo dnf install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel glog-devel gflags-devel lmdb-devel atlas-devel python-lxml boost-python3-devel

Setup Caffe

git clone
cd caffe

Configure Makefile.config (Include python3 and fix path)

Build Caffe

sudo make all
sudo make runtest
sudo make pycaffe
sudo make distribute
sudo cp .build_release/lib/ /usr/lib64
sudo cp -a distribute/python/caffe/ /usr/lib/python3.7/site-packages/



  1. Drop the pretrained caffe model into
  2. Run
    to generate tensorflow model.


    is a script for video stream.
  2. (Recommended)
    is a multiprocessing version script. When 3d plotting function shuts down in
    mentioned above, you can try this one.
    is a script for picture.
  4. (Deprecated)
    is a class implementation containing all the elements needed to run the model.
  5. (Deprecated)
    additionally provides ROS network and/or serial connection for communication in robot controlling.
  6. (Deprecated) The training script
    is not complete yet (I failed to reconstruct the model: ( So do not use it. Also pulling requests are welcomed.

[Tips] To run the scripts for video stream:

  1. click left mouse button to initialize the bounding box implemented by a simple HOG method;

  2. trigger any keyboard input to exit while running.


  1. With some certain programming environments, the 3d plotting function (from matplotlib) in
    shuts down. Use
  2. The input image is in BGR color format and the pixel value is mapped into a range of [-0.4, 0.6).
  3. The joint-parent map (detailed information in

  1. Here I have a sketch to show the joint positions (don't laugh lol):

  1. Every input image is assumed to contain 21 joints to be found, which means it is easy to fit wrong results when a joint is actually not in the picture.

About Training Data

For MPI-INF-3DHP dataset, refer to my another repository.

Reference Repositories

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.