[WACV2021] Foreground-aware Semantic Representations for Image Harmonization https://arxiv.org/abs/2006.00809
This repository contains the official PyTorch implementation of the following paper:
Foreground-aware Semantic Representations for Image Harmonization
Konstantin Sofiiuk, Polina Popenova, Anton Konushin
Samsung AI Center Moscow
https://arxiv.org/abs/2006.00809Abstract: Image harmonization is an important step in photo editing to achieve visual consistency in composite images by adjusting the appearances of foreground to make it compatible with background. Previous approaches to harmonize composites are based on training of encoder-decoder networks from scratch, which makes it challenging for a neural network to learn a high-level representation of objects. We propose a novel architecture to utilize the space of high-level features learned by a pre-trained classification network. We create our models as a combination of existing encoder-decoder architectures and a pre-trained foreground-aware deep high-resolution network. We extensively evaluate the proposed method on existing image harmonization benchmark and set up a new state-of-the-art in terms of MSE and PSNR metrics.
This framework is built using Python 3.6 and relies on the PyTorch 1.4.0+. The following command installs all necessary packages:
pip3 install -r requirements.txt
You can also use our Dockerfile to build a container with configured environment.
If you want to run training or testing, you must configure the paths to the datasets in config.yml.
We train and evaluate all our models on the iHarmony4 dataset. It contains 65742 training and 7404 test objects. Each object is a triple consisting of real image, composite image and foreground mask.
Before training we resize HAdobe5k subdataset so that each side is smaller than 1024. The resizing script is provided in resize_hdataset.ipynb.
Don't forget to change the paths to the datasets in config.yml after downloading and unpacking.
We provide the scripts for training our models on images of size 256 and 512. For each experiment, a separate folder is created in the
./harmonization_expswith Tensorboard logs, text logs, visualization and model's checkpoints. You can specify another path in the config.yml (see
EXPS_PATHvariable).
Start training with the following commands: ```.bash python3 train.py --gpus=0 --workers=4 --exp-name=first-try
python3 train.py models/fixed256/improved_dih.py --gpus=0 --workers=4 --exp-name=first-try
python3 train.py models/fixed256/hrnet18_idih.py --gpus=0 --workers=4 --exp-name=first-try
python3 train.py models/fixed256/hrnet18_idih.py --gpus=0 --workers=4 --exp-name=first-try
python3 train.py models/crop512/improved_dih.py --gpus=0 --workers=4 --exp-name=first-try ``
To see all training parameters, runpython3 train.py --help`.
We used pre-trained HRNetV2 models from the official repository. To train one of our models with HRNet backbone, download HRNet weights and specify their path in config.yml (see
IMAGENET_PRETRAINED_MODELSvariable).
We provide scripts to both evaluate and get predictions from any model. To do that, we specify all models configs in mconfigs. To evaluate a model different from the provided, a new config entry should be added.
You can specify the checkpoints path in config.yml (see
MODELS_PATHvariable) in advance and provide the scripts only with a checkpoint name instead of an absolute checkpoint path.
To get metrics table on the iHarmony4 test set run the following command: ```.bash python3 scripts/evaluate_model.py --resize-strategy Fixed256
python3 scripts/evaluatemodel.py improveddih256 /hdd0/harmonizationexps/fixed256/improveddih/checkpoints/lastcheckpoint.pth --resize-strategy Fixed256 ``
To see all evaluation parameters runpython3 scripts/evaluatemodel.py --help`.
To get predictions on a set of images, run the following command: ```.bash python3 scripts/predictfordir.py --images --masks --resize 256
python3 scripts/evaluatemodel.py improveddih256 /hdd0/harmonizationexps/fixed256/improveddih/checkpoints/lastcheckpoint.pth \ --images /hdd0/datasets/ImageHarmonization/test/compositeimages --masks /hdd0/datasets/ImageHarmonization/test/masks \ --resize 256 ``
To see all evaluation parameters runpython3 scripts/predictfordir.py --help`.
For interactive models testing with samples visualization see evalandvisharmonizationmodel.ipynb.
We provide metrics and pre-trained weights for several models trained on images of size 256x256 augmented with horizontal flip and random resized crop. Metric values may differ slightly from the ones in the paper since all the models were retrained from scratch with the new codebase.
Pre-trained models with corresponding names of model configs (see Evaluation): | Model | Download Link | Name in mconfigs | |:--------------------------:|:----------------------------------------------:|:-------------------------------------:| | iDIH256 | idih256.pth |improveddih256 | | iSSAM256 | issam256.pth |improvedssam256 | | DeepLab-ResNet34 + iDIH256 | deeplab_idih256.pth |deeplabr34idih256 | | HRNet18s + iDIH256 | hrnet18s_idih256.pth |hrnet18sidih256 | | HRNet18 + iDIH256 | hrnet18_idih256.pth |hrnet18idih256 | | HRNet18 pyramid + iDIH256 | hrnet18v2pidih256.pth |hrnet18v2pidih256 | | HRNet32 + iDIH256 | hrnet32_idih256.pth |hrnet32_idih256 |
Evaluation metrics:
Model | HCOCO | HAdobe5k | HFlickr | Hday2night | All | |||||
---|---|---|---|---|---|---|---|---|---|---|
Evaluation metric | MSE | PSNR | MSE | PSNR | MSE | PSNR | MSE | PSNR | MSE | PSNR |
Base models | ||||||||||
iDIH256 | 19.58 | 38.34 | 30.84 | 36.00 | 84.74 | 32.58 | 50.05 | 37.10 | 30.70 | 36.99 |
iSSAM256 | 16.48 | 39.16 | 22.60 | 37.24 | 69.67 | 33.56 | 40.59 | 37.72 | 24.65 | 37.95 |
iDIH256 with backbone | ||||||||||
DeepLab-ResNet34 | 17.68 | 38.97 | 28.13 | 36.33 | 70.89 | 33.25 | 56.17 | 37.25 | 27.37 | 37.53 |
HRNet18s | 14.30 | 39.52 | 22.57 | 37.18 | 63.03 | 33.70 | 51.20 | 37.69 | 22.82 | 38.15 |
HRNet18 | 13.79 | 39.62 | 25.44 | 36.91 | 60.63 | 33.88 | 44.94 | 37.74 | 22.99 | 38.16 |
HRNet18 pyramid | 14.10 | 39.56 | 24.47 | 37.04 | 62.13 | 33.90 | 47.74 | 37.46 | 23.10 | 38.15 |
HRNet32 | 14.00 | 39.71 | 23.04 | 37.13 | 57.55 | 34.06 | 53.70 | 37.70 | 22.22 | 38.29 |
The code is released under the MPL 2.0 License. MPL is a copyleft license that is easy to comply with. You must make the source code for any of your changes available under MPL, but you can combine the MPL software with proprietary code, as long as you keep the MPL code in separate files.
If you find this work is useful for your research, please cite our paper:
@article{sofiiuk2020harmonization, title={Foreground-aware Semantic Representations for Image Harmonization}, author={Konstantin Sofiiuk, Polina Popenova, Anton Konushin}, journal={arXiv preprint arXiv:2006.00809}, year={2020} }