Need help with GTA-IM-Dataset?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

150 Stars 8 Forks Other 17 Commits 2 Opened issues


[ECCV-20] 3D human scene interaction dataset:

Services available


Need anything else?

Contributors list

# 2,149
13 commits

GTA-IM Dataset [Website]

Long-term Human Motion Prediction with Scene Context, ECCV 2020 (Oral) PDF
Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik.

This repository maintains our GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. We collect HD RGB-D image seuqences of 3D human motion from realistic game engine. The dataset has clean 3D human pose and camera pose annoations, and large diversity in human appearances, indoor environments, camera views, and human activities.

Table of contents
1. A demo for playing with our dataset.
2. Instructions to request our full dataset.
3. Documentation on our dataset structure and contents.


(0) Getting Started

Clone this repository, and create local environment:

conda env create -f environment.yml

For your convinience, we provide a fragment of our data in

directory. And in this section, you will be able to play with different parts of our data using maintained tool scripts.

(1) 3D skeleton & point cloud

$ python -h
usage: [-h] [-pa PATH] [-f FRAME] [-fw FUSION_WINDOW]

now visualize demo 3d skeleton and point cloud!

$ python -pa demo -f 2720 -fw 80

You should be able to see a open3d viewer with our 3D skeleton and point cloud data, press 'h' in the viewer to see how to control the viewpoint:

Note that we use

open3d == 0.7.0
, the visualization code is not compatible with the newer version of open3d.

(2) 2D skeleton & depth map

$ python -h
usage: [-h] [-pa PATH]

now visualize 2d skeleton and depth map!

$ python -pa demo

You should be able to find a created

directory with
that render to a movie strip like this:

(3) RGB video

$ python -h
usage: [-h] [-pa PATH] [-s SCALE] [-fr FRAME_RATE]

now visualize demo video!

$ python -pa demo -fr 15

You should be able to find a created

directory with a

Requesting Dataset

To obtain the Dataset, please send an email to Zhe Cao (with the title "GTA-IM Dataset Download") stating:

  • Your name, title and affilation
  • Your intended use of the data
  • The following statement: > With this email we declare that we will use the GTA-IM Dataset for non-commercial research purposes only. We also undertake to purchase a copy of Grand Theft Auto V. We will not redistribute the data in any form except in academic publications where necessary to present examples.

We will promptly reply with the download link.

Dataset Contents

After you download data from our link and unzip, each sequence folder will contain the following files:

  • images
    • color images:
    • depth images:
    • instance masks:

  • info_frames.pickle
    : a pickle file contains camera information, 3d human poses (98 joints) in the global coordinate, weather condition, the character ID, and so on.
    import pickle
    info = pickle.load(open(data_path + 'info_frames.pickle', 'rb'))

  • info_frames.npz
    : it contains five arrays. 21 joints out of 98 human joints are extraced to form the minimal skeleton. Here is how we generate it from raw captures.
    • joints_2d
      : 2d human poses on the HD image plane.
    • joints_3d_cam
      : 3d human poses in the current frame's camera coordinate
    • joints_3d_world
      : 3d human poses in the game/world coordinate
    • world2cam_trans
      : the world to camera transformation matrix for each frame
    • intrinsics
      : camera intrinsics

    import numpy as np
    info_npz = np.load(rec_idx+'info_frames.npz'); 
    # 2d poses for frame 0

  • realtimeinfo.pickle
    : a backup pickle file which contains all information from the data collection.

Joint Types

The human skeleton connection and joints index name:

    (0, 1),  # head_center -> neck
    (1, 2),  # neck -> right_clavicle
    (2, 3),  # right_clavicle -> right_shoulder
    (3, 4),  # right_shoulder -> right_elbow
    (4, 5),  # right_elbow -> right_wrist
    (1, 6),  # neck -> left_clavicle
    (6, 7),  # left_clavicle -> left_shoulder
    (7, 8),  # left_shoulder -> left_elbow
    (8, 9),  # left_elbow -> left_wrist
    (1, 10),  # neck -> spine0
    (10, 11),  # spine0 -> spine1
    (11, 12),  # spine1 -> spine2
    (12, 13),  # spine2 -> spine3
    (13, 14),  # spine3 -> spine4
    (14, 15),  # spine4 -> right_hip
    (15, 16),  # right_hip -> right_knee
    (16, 17),  # right_knee -> right_ankle
    (14, 18),  # spine4 -> left_hip
    (18, 19),  # left_hip -> left_knee
    (19, 20)  # left_knee -> left_ankle

Important Note

This dataset is for non-commercial research purpose only. Due to public interest, I decided to reimplement the data generation pipeline from scratch to collect the GTA-IM dataset again. I do not use Facebook resources to reproduce the data.


We believe in open research and we will be happy if you find this data useful. If you use it, please consider citing our work.

  author = {Zhe Cao and
    Hang Gao and
    Karttikeya Mangalam and
    Qizhi Cai and
    Minh Vo and
    Jitendra Malik},
  title = {Long-term human motion prediction with scene context},
  booktitle = ECCV,
  year = {2020},


Our data collection pipeline was built upon this plugin and this tool.


Our project is released under CC-BY-NC 4.0.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.