Need help with CaGNet-Zero-Shot-Semantic-Segmentation?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

bcmi
225 Stars 11 Forks 52 Commits 1 Opened issues

Description

Code for our ACMMM2020 paper "Context-aware Feature Generation for Zero-shot Semantic Segmentation".

Services available

!
?

Need anything else?

Contributors list

No Data

CaGNet: Context-aware Feature Generation for Zero-shot Semantic Segmentation

Code for our ACM MM 2020 paper "Context-aware Feature Generation for Zero-shot Semantic Segmentation".

Created by Zhangxuan Gu, Siyuan Zhou, Li Niu*, Zihan Zhao, Liqing Zhang*.

Paper Link: [arXiv]

News

In our journal extension CaGNetv2 [arXiv, github], we extend pixel-wise feature generation and finetuning to patch-wise feature generation and finetuning.

Visualization on Pascal-VOC

Visualization on Pascal-VOC

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{Gu2020CaGNet,
  title={Context-aware Feature Generation for Zero-shot Semantic Segmentation},
  author={Zhangxuan Gu and Siyuan Zhou and Li Niu and Zihan Zhao and Liqing Zhang},
  booktitle={ACM International Conference on Multimedia},
  year={2020}
}

Introduction

Existing semantic segmentation models heavily rely on dense pixel-wise annotations. To reduce the annotation pressure, we focus on a challenging task named zero-shot semantic segmentation, which aims to segment unseen objects with zero annotations. This can be achieved by transferring knowledge across categories via semantic word embeddings. In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named as CaGNet. In particular, with the observation that a pixel-wise feature highly depends on its contextual information, we insert a contextual module in a segmentation network to capture the pixel-wise contextual information, which guides the process of generating more diverse and context-aware features from semantic word embeddings. Our method achieves state-of-the-art results on three benchmark datasets for zero-shot segmentation.

Overview of Our CaGNet

Experiments

Basic Settings

  • Inductive or Transductive:

    Inductive -> No test samples (images and annotations) are available during training (including finetuning).

  • Generalized or Non-generalized:

    Generalized -> Both seen and unseen categories can appear in test samples.

  • Baselines:

    SPNet [github, paper] & ZS3Net [github, paper]

  • Backbone Network:

    DeepLabV2 (ResNet-101) pre-trained on ImageNet (following SPNet)

  • Semantic Word Embedding:

    Word2vec (300-dim) & FastText (300-dim)

  • Datasets:

    • Pascal-Context

    Samples: 4998 train / 5105 test

    Split: 33 classes including 29 seen / 4 unseen "cow, motorbike, sofa, cat" - COCO-Stuff

    Samples: 118288 train / 5001 test

    Split: 182 classes including 167 seen / 15 unseen (following SPNet) - Pascal-VOC and SBD (Semantic Boundary Dataset)

    Samples: 11685 train / 1449 test

    Split: 20 classes including 15 seen / 5 unseen (following SPNet)

  • "Background" or Not:

    ZS3Net uses the word embedding of "background" as the semantic representation of all categories (e.g., sky and ground) belonging to "background", which seems a little unreasonable, while SPNet ignores "background" in both training and testing. Although including "background" can bring large performance gain, we follow SPNet and ignore it all the time.

  • Additional Operation on Train Samples:

    Since train images may contain pixels that do not belong to seen categories (e.g. unseen categories, background, or no label), we mark the annotations of these pixels as 'ignored' so that only seen categories are visible during training (including finetuning).

Results

“ST” in the following tables stands for self-training mentioned in ZS3Net.

Our Results on Pascal-Context dataset

| Method | hIoU | mIoU | pixel acc. | mean acc. | S-mIoU | U-mIoU | | :-----------: | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: | | SPNet | 0 | 0.2938 | 0.5793 | 0.4486 | 0.3357 | 0 | | SPNet-c | 0.0718 | 0.3079 | 0.5790 | 0.4488 | 0.3514 | 0.0400 | | ZS3Net | 0.1246 | 0.3010 | 0.5710 | 0.4442 | 0.3304 | 0.0768 | | CaGNet | 0.2061 | 0.3347 | 0.5924 | 0.4900 | 0.3610 | 0.1442 | | ZS3Net+ST | 0.1488 | 0.3102 | 0.5725 | 0.4532 | 0.3398 | 0.0953 | | CaGNet+ST | 0.2252 | 0.3352 | 0.5961 | 0.4962 | 0.3644 | 0.1630 |

Our Results on COCO-Stuff dataset

| Method | hIoU | mIoU | pixel acc. | mean acc. | S-mIoU | U-mIoU | | :-----------: | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: | | SPNet | 0.0140 | 0.3164 | 0.5132 | 0.4593 | 0.3461 | 0.0070 | | SPNet-c | 0.1398 | 0.3278 | 0.5341 | 0.4363 | 0.3518 | 0.0873 | | ZS3Net | 0.1495 | 0.3328 | 0.5467 | 0.4837 | 0.3466 | 0.0953 | | CaGNet | 0.1819 | 0.3345 | 0.5658 | 0.4845 | 0.3549 | 0.1223 | | ZS3Net+ST | 0.1620 | 0.3367 | 0.5631 | 0.4862 | 0.3489 | 0.1055 | | CaGNet+ST | 0.1946 | 0.3372 | 0.5676 | 0.4854 | 0.3555 | 0.1340 |

Our Results on Pascal-VOC dataset

| Method | hIoU | mIoU | pixel acc. | mean acc. | S-mIoU | U-mIoU | | :-----------: | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: | | SPNet | 0.0002 | 0.5687 | 0.7685 | 0.7093 | 0.7583 | 0.0001 | | SPNet-c | 0.2610 | 0.6315 | 0.7755 | 0.7188 | 0.7800 | 0.1563 | | ZS3Net | 0.2874 | 0.6164 | 0.7941 | 0.7349 | 0.7730 | 0.1765 | | CaGNet | 0.3972 | 0.6545 | 0.8068 | 0.7636 | 0.7840 | 0.2659 | | ZS3Net+ST | 0.3328 | 0.6302 | 0.8095 | 0.7382 | 0.7802 | 0.2115 | | CaGNet+ST | 0.4366 | 0.6577 | 0.8164 | 0.7560 | 0.7859 | 0.3031 |

Please note that our reproduced results of SPNet on Pascal-VOC dataset are obtained using their released model and code with careful tuning, but still lower than their reported results.

Hardware Dependency

Our released code temporarily supports a single GPU or multiple GPUs. To acquire satisfactory training results, we advise that each GPU card should be at least 32GB with batch size larger than 8.

The results in the conference paper / this repository are obtained on a single 32GB GPU with batch size 8. If you use multiple GPUs (each ≥ 32GB) to train CaGNet, you might hopefully achieve better results.

Getting Started

Installation

1.Clone this repository.

git clone https://github.com/bcmi/CaGNet-Zero-Shot-Semantic-Segmentation.git

2.Create python environment for CaGNet via conda.

conda env create -f CaGNet_environment.yaml

3.Download dataset.

  • Pascal-VOC

    --> CaGNetVOC2012data.tar : BCMI-Cloud or BaiduNetDisk (extraction code: beau) 1) download the above .tar file into directory ./dataset/voc12/ 2) uncompress it to form ./dataset/voc12/images/ and ./dataset/voc12/annotations/

  • Pascal-Context

    --> CaGNetcontextdata.tar : BCMI-Cloud or BaiduNetDisk (extraction code: rk29) 1) download the above .tar file into directory ./dataset/context/ 2) uncompress it to form ./dataset/context/images/ and ./dataset/context/annotations/

  • COCO-Stuff 1) follow the setup instructions on the COCO-Stuff homepage to obtain two folders: images and annotations. 2) move the above two folders into directory ./dataset/cocostuff/ to form ./dataset/cocostuff/images/ and ./dataset/cocostuff/annotations/

4.Download pre-trained weights and our optimal models into directory ./trained_models/

  • deeplabv2 pretrained weight for Pascal-VOC and Pascal-Context

    --> deeplabv2resnet101init.pth : BCMI-Cloud or BaiduNetDisk (extraction code: 5o0m)

  • SPNet pretrained weight for COCO-Stuff

    --> spnetcocostuffinit.pth : BCMI-Cloud or BaiduNetDisk (extraction code: qjpo)

  • our best model on Pascal-VOC

    --> voc12_ourbest.pth : BCMI-Cloud or BaiduNetDisk (extraction code: nxj4)

  • our best model on Pascal-Context

    --> context_ourbest.pth : BCMI-Cloud or BaiduNetDisk (extraction code: 0x2i)

  • our best model on COCO-Stuff

    --> cocostuff_ourbest.pth : BCMI-Cloud or BaiduNetDisk (extraction code: xl88)

Training

1.Train on Pascal-VOC dataset

python train.py --config ./configs/voc12.yaml --schedule step1
python train.py --config ./configs/voc12_finetune.yaml --schedule mixed

2.Train on Pascal-Context dataset

python train.py --config ./configs/context.yaml --schedule step1
python train.py --config ./configs/context_finetune.yaml --schedule mixed

3.Train on COCO-Stuff dataset

python train.py --config ./configs/cocostuff.yaml --schedule step1
python train.py --config ./configs/cocostuff_finetune.yaml --schedule mixed

Testing

1.Test our best model on Pascal-VOC dataset

python train.py --config ./configs/voc12.yaml --init_model ./trained_models/voc12_ourbest.pth --val

2.Test our best model on Pascal-Context dataset

python train.py --config ./configs/context.yaml --init_model ./trained_models/context_ourbest.pth --val

3.Test our best model on COCO-Stuff dataset

python train.py --config ./configs/cocostuff.yaml --init_model ./trained_models/cocostuff_ourbest.pth --val

Visualization

COMING SOON !

Try on Custom Data

COMING SOON !

Acknowledgement

Some of the codes are built upon FUNIT and SPNet. Thanks them for their great work!

If you get any problems or if you find any bugs, don't hesitate to comment on GitHub or make a pull request!

CaGNet is freely available for non-commercial use, and may be redistributed under these conditions. For commercial queries, please drop an e-mail. We will send the detail agreement to you.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.