Need help with ActionVLAD?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

213 Stars 67 Forks Other 22 Commits 6 Opened issues


ActionVLAD for video action classification (CVPR 2017)

Services available


Need anything else?

Contributors list

# 200,191
4 commits
# 402,527
1 commit

ActionVLAD: Learning spatio-temporal aggregation for action classification

If this code helps with your work/research, please consider citing

Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic and Bryan Russell. ActionVLAD: Learning spatio-temporal aggregation for action classification. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

    title = {{ActionVLAD}: Learning spatio-temporal aggregation for action classification},
    author = {Girdhar, Rohit and Ramanan, Deva and Gupta, Abhinav and Sivic, Josef and Russell, Bryan},
    booktitle = {CVPR},
    year = 2017


  • July 15, 2017: Released Charades models
  • May 7, 2017: First release

Quick Fusion

If you're only looking for our final last-layer features that can be combined with your method, we provide those for the following datasets:

  1. HMDB51:
  2. Charades v1:

Note: Be careful to re-organize them given our filename and class ordering.

Docker installation

Create docker_files folder where there should be the cudnn5.1 (include and lib) and also the models folder.

$ docker build -t action:latest .


This code has been tested on a Linux (CentOS 6.5) system, though should be compatible with any OS running python and tensorflow.

  1. TensorFlow (0.12.0rc0)

    • There have been breaking API changes in v1.0, so this code is not directly compatible with the latest tensorflow release. You can try to use my pre-compiled WHL file.
    • You may consider installing tensorflow into an environment. On anaconda, it can be done by:

      $ conda create --name tf_v0.12.0rc0
      $ source activate tf_v0.12.0rc0
      $ conda install pip  # need to install pip into this env,
                           # else it will use global pip and overwrite your
                           # main TF installation
      $ pip install h5py  # and other libs, if need to be installed
      $ git clone
      $ git checkout tags/0.12.0rc0
      $ # Compile tensorflow. Refer
      $ # If compiling on a CentOS (<7) machine, you might find the following instructions useful:
      $ #
      $ pip install --upgrade --ignore-installed /path/to/tensorflow_pkg.whl
  2. Standard python libraries

    • pip
    • scikit-learn 0.17.1
    • h5py
    • pickle, cPikcle etc

Quick Demo

This demo runs the RGB ActionVLAD model on a video. You will need the pretrained models, which can be downloaded using the
script, as described later in this document.
$ cd demo
$ bash 

Setting up the data

The videos need to be stored on disk as individual frame JPEG files, and similarly for optical flow. The list of train/test videos are specified by text files, similar to the one in

. Each line consists of:
video_path number_of_frames class_id

Sample train/test files are in

. The frames must be named in format:
. Flow is stored similarly, with 2(n-1) files per video than the frames (n), named as
, where the
corresponds to
. This follows the data style followed in various previous works.

NOTE: For HMDB51, I renamed the videos to avoid issues with special characters in the filenames, and hence the numbers in the train/test files. The list of actual filenames is provided in

, and the new name for each video in that list is the 1-indexed line number of that video. The
contains all the HMDB videos that are a part of one or all of the train/test splits (it has fewer entries than
because some videos are not in any split). So, the video
in that file (and in the train/test split files) would correspond to the line number 19 in

Create soft links to the directories where the frames are stored as following, so the provided scripts work out-of-the-box.

$ ln -s /path/to/hmdb51/frames data/hmdb51/frames
$ ln -s /path/to/hmdb51/flow data/hmdb51/flow

and so on. Since the code requires random access to this data while training, it is advisable to store the frames/flow on a fast disk/SSD.

For ease of reproduction, you can download our frames (

, 9.3GB) and optical flow (
, 4.7GB) on HMDB51. Our UCF101 models should be compatible with the data provided with the Good Practices paper.

Charades Data

Can be directly downloaded from official website. This code assumes the 480px scaled frames to be stored at


Testing pre-trained models

Download the models using
script. Comment out specific lines to download a subset of models.

Test all the models using the following scripts:

$ cd experiments
$ bash  # Stores all the features for each split
$ bash   # change split_id to get final number for each split.

The above scripts (with provided models) should reproduce the following performance. The iDT features are available from [Varol16]. You can also run these with the pre-computed features provided in the


| Split | RGB | Flow | Combined (1:2) | iDT[Varol16] | ActionVLAD+iDT | |--------|-----|------|----------|------|-----| | 1 | 51.4 | 59.0 | 66.7 | 56.7 | 70.1 | | 2 | 49.2 | 59.7 | 66.5 | 57.2 | 69.0 | | 3 | 48.6 | 60.6 | 66.3 | 57.8 | 70.1 | | Avg | 49.7 | 59.8 | 66.5 | 57.2 | 69.7 |

NOTE: There is very small difference (<0.1%) in the final numbers above from what's reported in the paper. This was due to an undocumented behavior of tensorflow

functionality, which is slightly non-deterministic when used with multiple threads. This can lead to some local shuffling in the order of videos at test time, which leads to inconsistent results when late-fusing different methods. This has been fixed now by forcing the use of a single thread when saving features to the disk.

Charades testing

Charades models were trained using a slightly different version of TF, so need a bit more work to test. Download the model data file as mentioned in the
script (by default, it will download). Then,
$ cp models/PreTrained/ActionVLAD-pretrained/charadesV1/checkpoint.example models/PreTrained/ActionVLAD-pretrained/charadesV1/checkpoint
$ vim models/PreTrained/ActionVLAD-pretrained/charadesV1/checkpoint
$ # modify the file and replace the $BASE_DIR with the **absolute path** of where the ActionVLAD repository is cloned to
$ # Now, for testing
$ cd experiments && bash
$ cd .. && bash eval/ data/charadesV1/feats.h5

The above should reproduce the following numbers:

| | mAP | wAP | |------|-----|-----| | ActionVLAD (RGB) | 17.66 | 25.17 |


Note that in the following training steps, RGB model is trained directly on top of ImageNet initialization while the flow models are trained over the flow stream of a two-stream model. This is just because we found that training the last few layers in RGB stream (of a two-stream model) gets good enough performance, so everything before and including conv53 is left untouched to the imagenet initialization. Since we build our model on top of conv53, we end up essentially training on top of ImageNet initialization.

RGB model

$ ### Initialization for ActionVLAD (KMeans)
$ cd experiments
$ bash  # extract random subset of features
$  # cluster the features to initialize ActionVLAD
$ ### Training the model
$ bash  # trains the last layer with fixed ActionVLAD
$ bash  # trains the last layer+actionVLAD+conv5
$ bash  # evaluates the final trained model

Flow model

$ ### Initialization for ActionVLAD (KMeans)
$ cd experiments
$ bash  # extract random subset of features
$  # cluster the features to initialize ActionVLAD
$ ### Training the model
$ bash  # trains the last layer with fixed ActionVLAD
$ bash  # trains the last layer+actionVLAD+conv5
$ bash  # evaluates the final trained model


Two-stream models

The following scripts run testing on the flow stream of our two-stream models. As mentioned earlier, we didn't need a RGB stream model for ActionVLAD training since we could train directly on top of ImageNet initialization.

$ cd experiments
$ bash

You can also train two-stream models using this code base. Here's a sample script to train a RGB stream (not tested, so might require playing around with hyperparameters):

$ cd experiments
$ bash
$ bash


[Varol16]: Gul Varol, Ivan Laptev and Cordelia Schmid. Long-term Convolutions for Action Recognition. arXiv 2016.


This code is based on the tensorflow/models repository, so thanks to the original authors/maintainers for releasing the code.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.