Need help with X-Temporal?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

314 Stars 38 Forks MIT License 26 Commits 0 Opened issues


A general video understanding codebase from SenseTime X-Lab

Services available


Need anything else?

Contributors list


Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs

X-Temporal is an open source video understanding codebase from Sensetime X-Lab group that provides state-of-the-art video classification models, including papers "Temporal Segment Networks", "Temporal Interlacing Network", "Temporal Shift Module", "ResNet 3D", "SlowFast Networks for Video Recognition", and "Non-local Neural Networks".

This repo includes all models and codes used in our 1st place solution in ICCV19-Multi Moments in Time Challenge Challenge Website


  • Support popular video understanding frameworks
    • SlowFast
    • R(2+1)D
    • R3D
    • TSN
    • TIN
    • TSM
  • Support various datasets (Kinetics, Something2Something, Multi-Moments in Time...)
    • Take raw video as input
    • Take video RGB frames as input
    • Take video Flow frames as input
    • Support Multi-label dataset
  • High-performance and modular design can help rapid implementation and evaluation of novel video research ideas.


v0.1.0 (08/04/2020)

X-Temporal is online!

Get started


The code is built with following libraries:

For extracting frames from video data, you may need ffmpeg.


  1. clone repo
    git clone  X-Temporal
    cd X-Temporal
  2. run the install script

Prepare dataset

Each row in the meta file of the data set represents a video, which is divided into 3 columns, which are the picture folder, frame number, and category id after the frame extraction. For example as shown below:

abseiling/Tdd9inAW1VY_000361_000371 300 0
zumba/x0KPHFRbzDo_000087_000097 300 599

You can also directly read the original video file. Decor library is used in X-Temporal code for real-time video frame extraction.

abseiling/Tdd9inAW1VY_000361_000371.mkv 300 0
zumba/x0KPHFRbzDo_000087_000097.mkv 300 599
In the tools folder, scripts for extracting frames and generating data set meta files are provided.

About multi-label classification

The format of the multi-category data set is as follows, which are the video path, the number of frames, and the categories included.

trimming/getty-cutting-meat-cleaver-video-id163936215_13.mp4 90 144,246
exercising/meta-935267_68.mp4 92 69
cooking/yt-SSLy25MQb9g_307.mp4 91 264,311,7,188,246

YAML config:

    loss_type: bce
    multi_class: True


  1. Create a folder for the experiment.

    cd /path/to/X-Temporal
    mkdir -p experiments/test
  2. New or copy config from existing experiment config.

    cp experiments/r2plus1d/default.config experiments/test
    cp experiments/r2plus1d/ experiments/test
  3. Set up training scripts, where ROOT and cfg fiile may need to be changed according to specific settings ``

    date +%m%d%H%M` ROOT=../.. cfg=default.yaml


python $ROOT/x_temporal/ --config $cfg | tee log.train.$T ```

  1. Start training.


  1. Set the resumemodel path in config. ```yaml saver: # Required. resumemodel: checkpoints/ckpt_e13.pth # checkpoint to test ```
  2. Set the parameters in the evaluate in config, such as the need to use multiple crops on the spatial and temporal during the test to modify the specific parameters. (it is recommended to reduce the batchsize by the same proportion)
    spatial_crops: 3
    temporal_samples: 10
  3. Modify or create new, the main modification is to change to The sample is as follows: ``
    date +%m%d%H%M` ROOT=../.. cfg=default.yaml


python $ROOT/x_temporal/ --config $cfg | tee log.test.$T

4. Start Testing
bash ./ ```



X-Temporal is released under the MIT license.

Configuration details


Kindly cite our publications if this repo and algorithms help in your research. ``` @article{zhang2020top, title={Top-1 Solution of Multi-Moments in Time Challenge 2019}, author={Zhang, Manyuan and Shao, Hao and Song, Guanglu and Liu, Yu and Yan, Junjie}, journal={arXiv preprint arXiv:2003.05837}, year={2020} }

@article{shao2020temporal, title={Temporal Interlacing Network}, author={Hao Shao and Shengju Qian and Yu Liu}, year={2020}, journal={AAAI}, } ```


X-Temporal is maintained by Hao Shao and ManYuan Zhang and Yu Liu.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.