Need help with few_shot_gaze?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

145 Stars 29 Forks Other 40 Commits 8 Opened issues


Pytorch implementation and demo of FAZE: Few-Shot Adaptive Gaze Estimation (ICCV 2019, oral)

Services available


Need anything else?

Contributors list

# 348,579
14 commits
# 8,494
4 commits


Release 09/22/2020 * Incorporated (a) distributed data parallel trainng and (b) fusedSGD optimizer, resulting in 2x faster training.

FAZE: Few-Shot Adaptive Gaze Estimation

This repository contains the code for training and evaluation of our ICCV 2019 work, which was presented as an Oral presentation. FAZE is a framework for few-shot adaptation of gaze estimation networks, consisting of equivariance learning (via the DT-ED or Disentangling Transforming Encoder-Decoder architecture) and meta-learning with gaze-direction embeddings as input.

The FAZE Framework


Training and Evaluation

1. Datasets

Pre-process the GazeCapture and MPIIGaze datasets using the code-base at which is also available as a git submodule at the relative path,


If you have already cloned this

repository without pulling the submodules, please run:
git submodule update --init --recursive

After the dataset preprocessing procedures have been performed, we can move on to the next steps.

2. Prerequisites

This codebase should run on most standard Linux systems. We specifically used Ubuntu

Please install the following prerequisites manually (as well as their dependencies), by following the instructions found below: * PyTorch 1.3 - * NVIDIA Apex - * please note that only NVIDIA Volta and newer architectures can benefit from AMP training via NVIDIA Apex.

The remaining Python package dependencies can be installed by running:

pip3 install --user --upgrade -r requirements.txt

3. Pre-trained weights for the DT-ED architecture and MAML models

You can obtain a copy of the pre-trained weights for the Disentangling Transforming Encoder-Decoder and for the various MAML models from the following location.

cd src/
wget -N
unzip -o

4. Training, Meta-Learning, and Final Evaluation

Run the all-in-one example bash script with:

cd src/
bash full_train_test_and_plot.bash

The bash script should be self-explanatory and can be edited to replicate the final FAZE model evaluation procedure, given that hardware requirements are satisfied (8x GPUs, where each are Tesla V100 GPUs with 32GB of memory).

The pre-trained DT-ED weights should be loaded automatically by the script
. Please note that this model can take a long time to train when training from scratch, so we recommend adjusting batch sizes and the using multiple GPUs (the code is multi-GPU-ready).

The Meta-Learning step is also very time consuming, particularly because it must be run for every value of

or number of calibration samples. The code pertinent to this step is
, and its execution is recommended to be done in parallel as shown in

5. Outputs

When the full pipeline successfully runs, you will find some outputs in the path

, in particular: * walks/: mp4 videos of latent space walks in gaze direction and head orientation * ZgOLR1e-03IN5ILR1e-05Net64/: outputs of the meta-learning step. * ZgOLR1e-03IN5ILR1e-05Net64 MAML MPIIGaze.pdf: plotted results of the few-shot learning evaluations on MPIIGaze. * ZgOLR1e-03IN5ILR1e-05Net64 MAML GazeCapture (test).pdf: plotted results of the few-shot learning evaluations on the GazeCapture test set.

Realtime Demo

We also provide a realtime demo that runs with live input from a webcam in the

folder. Please check the separate demo instructions for details of how to setup and run it.


Please cite our paper when referencing or using our code.

  author    = {Seonwook Park and Shalini De Mello and Pavlo Molchanov and Umar Iqbal and Otmar Hilliges and Jan Kautz},
  title     = {Few-Shot Adaptive Gaze Estimation},
  year      = {2019},
  booktitle = {International Conference on Computer Vision (ICCV)},
  location  = {Seoul, Korea}


Seonwook Park carried out this work during his internship at NVIDIA. This work was supported in part by the ERC Grant OPTINT (StG-2016-717054).

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.