Need help with image-captioning?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

202 Stars 38 Forks 9 Commits 4 Opened issues


Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Services available


Need anything else?

Contributors list


This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here.

Please cite with the following BibTeX:

  title={X-Linear Attention Networks for Image Captioning},
  author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


Data preparation

  1. Download the bottom up features and convert them to npz files

    python2 tools/ --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100
  2. Download the annotations into the mscoco folder. More details about data preparation can be referred to self-critical.pytorch

  3. Download coco-caption and setup the path of _C.INFERENCE.COCOPATH in lib/

  4. The pretrained models and results can be downloaded here.

  5. The pretrained SENet-154 model can be downloaded here.


Train X-LAN model

bash experiments/xlan/

Train X-LAN model using self critical

Copy the pretrained model into experiments/xlanrl/snapshot and run the script ``` bash experiments/xlanrl/ ```

Train X-LAN transformer model

bash experiments/xtransformer/

Train X-LAN transformer model using self critical

Copy the pretrained model into experiments/xtransformerrl/snapshot and run the script ``` bash experiments/xtransformerrl/ ```


CUDA_VISIBLE_DEVICES=0 python3 --folder experiments/model_folder --resume model_epoch


Thanks the contribution of self-critical.pytorch and awesome PyTorch team.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.