Need help with Music-Dance-Video-Synthesis?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

179 Stars 20 Forks 52 Commits 5 Opened issues


(ACM MM 20 Oral) PyTorch implementation of Self-supervised Dance Video Synthesis Conditioned on Music

Services available


Need anything else?

Contributors list

Self-supervised Dance Video Synthesis Conditioned on Music

Pytorch implementation for this paper by Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen

To appear in ACM MM 2020

[Paper] [Paper_MM]

The demo video is shown at:

The dataset and the code for training and test is released.

A notebook for demo and quick start will be provided soon.

Some Demo:

More samples can be seen in demo video.


python 3.5 + pytorch 1.0

For the Testing part, you should install ffmpeg for music video.

We use tensorboardX for logging. If you don't install it, you can just comment the line in


This training process is intended for the clean part dataset, which could be downloaded here. 1. Download the dataset and put it under ./dataset

  1. Run
    training script will load config of If you want to train the model on other datasets, you should change the config in


If you want to use the pretrained model, you can firstly download it from here, put it under "pretrainmodel" and change the path of to "./pretrainmodel/generator0400.pth". 1. Run

python --output the_output_path
2. Make the output skeleton sequence to music video
cd Demo
Note that you should change the paths and the "max" variable in


For this part, we adapt the method of the paper "Everybody dance now".

And We use this pytorch implementation.


For the proposed cross-modal metric in our paper, we re-implement the paper: Human Motion Analysis with Deep Metric Learning (ECCV 2018).

The implementation of this paper can be seen at:


To use the dataset, please refer the notebook "dataset/usage_dataset.ipynb"

As state in the paper, we collect 60 videos in total, and divide them into 2 part according to the cleaness of the skeletons.

The clean part(40 videos):

The noisy part(20 videos):

To support further study, we also provide other collected data:




Besides, we also provide the BaiduNetDisk version: (includes all the dataset)


If you have questions for our work, please email to [email protected]


If you use this code for your research, please cite our paper.

author = {Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen},
title = {Self-supervised Dance Video Synthesis Conditioned on Music},
booktitle = {ACM MM},
year = {2020}

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.