Need help with grid-feats-vqa?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

facebookresearch
133 Stars 21 Forks Apache License 2.0 14 Commits 3 Opened issues

Description

Grid features pre-training code for visual question answering

Services available

!
?

Need anything else?

Contributors list

In Defense of Grid Features for Visual Question Answering

Grid Feature Pre-Training Code

This is a feature pre-training code release of the paper:

@InProceedings{jiang2020defense,
  title={In Defense of Grid Features for Visual Question Answering},
  author={Jiang, Huaizu and Misra, Ishan and Rohrbach, Marcus and Learned-Miller, Erik and Chen, Xinlei},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}
For more sustained maintenance, we release code using Detectron2 instead of mask-rcnn-benchmark which the original code is based on. The current repository should reproduce the results reported in the paper, e.g., reporting ~72.5 single model VQA score for a X-101 backbone paired with MCAN-large.

Installation

Install Detectron 2 following INSTALL.md. Since Detectron 2 is also being actively updated which can result in breaking behaviors, it is highly recommended to install via the following command:

bash
python -m pip install 'git+https://github.com/facebookresearch/[email protected]'
Commits before or after
ffff8ac
might also work, but it could be risky. Then clone this repository:
bash
git clone [email protected]:facebookresearch/grid-feats-vqa.git
cd grid-feats-vqa

Data

Visual Genome

train+val
splits released from the bottom-up-attention code are used for pre-training, and
test
split is used for evaluating detection performance. All of them are prepared in COCO format but include an additional field for
attribute
prediction. We provide the
.json
files here which can be directly loaded by Detectron2. Same as in Detectron2, the expected dataset structure under the
DETECTRON2_DATASETS
(default is
./datasets
relative to your current working directory) folder should be:
visual_genome/
  annotations/
    visual_genome_{train,val,test}.json
  images/
    # visual genome images (~108K)

Training

Once the dataset is setup, to train a model, run (by default we use 8 GPUs):

bash
python train_net.py --num-gpus 8 --config-file 
For example, to launch grid-feature pre-training with ResNet-50 backbone on 8 GPUs, one should execute:
bash
python train_net.py --num-gpus 8 --config-file configs/R-50-grid.yaml
The final model by default should be saved under
./output
of your current working directory once it is done training. We also provide the region-feature pre-training configuration
configs/R-50-updn.yaml
for reference. Note that we use
0.2
attribute loss (
MODEL.ROI_ATTRIBUTE_HEAD.LOSS_WEIGHT = 0.2
), which is better for down-stream tasks like VQA per our analysis.

We also release the configuration (

configs/R-50-updn.yaml
) for training the region features described in bottom-up-attention paper, which is a faithful re-implementation of the original one in Detectron2.

Feature Extraction

Grid feature extraction can be done by simply running once the model is trained (or you can directly download our pre-trained models, see below):

bash
python extract_grid_feature.py -config-file configs/R-50-grid.yaml --dataset 
and the code will load the final model from
cfg.OUTPUT_DIR
(which one can override in command line) and start extracting features for
, we provide three options for the dataset: 
coco_2014_train
,
coco_2014_val
and
coco_2015_test
, they correspond to
train
,
val
and
test
splits of the VQA dataset. The extracted features can be conveniently loaded in Pythia.

To extract features on your customized dataset, you may want to dump the image information into COCO

.json
format, and add the dataset information to use
extract_grid_feature.py
, or you can hack
extract_grid_feature.py
and directly loop over images.

Pre-Trained Models and Features

We release several pre-trained models for grid features: one with R-50 backbone, one with X-101, one with X-152, and one with additional improvements used for the 2020 VQA Challenge (see

X-152-challenge.yaml
). The models can be used directly to extract features. For your convenience, we also release the pre-extracted features for direct download.

| Backbone | AP50:95 | Download | | -------- | ---- | -------- | | R-50 | 3.1 | model |  metrics |  features | | X-101 | 4.3 | model |  metrics |  features | | X-152 | 4.7 | model |  metrics |  features | | X-152++ | 3.7 | model |  metrics |  features |

License

The code is released under the Apache 2.0 license.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.