Python
Need help with imgcomp-cvpr?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.
fab-jul

Description

TensorFlow implementation of Conditional Probability Models for Deep Image Compression, published in CVPR 2018

138 Stars 49 Forks GNU General Public License v3.0 19 Commits 1 Opened issues

Services available

Need anything else?

Conditional Probability Models for Deep Image Compression

[Paper] [Citation]

TensorFlow implementation of Conditional Probability Models for Deep Image Compression, published in CVPR 2018.

Prerequisites

  • Download checkpoints/pre-trained models here and extract them to
    ckpts
  • Python 3 (tested with Python 3.4.5)
  • TensorFlow (tested with tensorflow-gpu version 1.4.1)
  • Python packages as specified by requirements.txt (
    pip install -r requirements.txt
    )
  • A CUDA-compatible GPU

Notes about naming in code vs. paper

  • qbar
    is the output of Eq. (4), i.e.,
    qhard
    in the forward pass, and
    qsoft
    in the backward pass, where
    qhard
    corresponds to Eq. (2) and
    qsoft
    to Eq. (3).
  • We quantize
    z
    to one value in the centers
    C
    . We refer to the index in
    C
    as
    symbols
    . So, if e.g.
    C = {-2, 
    1, 0,
    1, 2}
    , and
    z=0.75
    ,
    z
    is quantized to
    C[3] = 1
    , making
    qhard = 1
    and
    symbol = 3
    (indices start from 0).
  • Our context model (Fig. 3) is called a probability classifier in the code, since it resembels a classifier (predicting the symbols). The relevant file is
    probclass.py
    , and is frequently abbreviated to
    pc
    .
  • The auto-encoder is found in
    autoencoder.py
    and abbreviated
    ae
    .
  • The importance map is called heatmap in the code.

Inference

Plot Kodak

NOTE: Inference only works on CUDA-compatible GPUs.


To do inference, use the following command

python val.py ../ckpts MODEL_ID DATASET --save_ours

where

MODEL_ID
is one of
  • 0515_1103
    : Point A in the plot (on Kodak: bpp: 0.370, MS-SSIM: 0.975)
  • 0515_1309
    : Point B in the plot (on Kodak: bpp: 0.677, MS-SSIM: 0.987)
  • 0515_1310
    : Point C in the plot (on Kodak: bpp: 1.051, MS-SSIM: 0.992)

and

DATASET
is either the path to a directory of png files or an escaped glob (e.g.,
some/images/\*/\*.jpg
). All images readable with PIL should be supported.

This will save outputs in

ckpts/MODEL_ID\ DATASET/imgs
and display the mean bpp and MS-SSIM on console. Detailed measures per image are written to
ckpts/MODEL_ID\ DATASET/measures.csv
. Note that some images may be padded.

Encoding to bitstream

By default,

val.py
will use cross entropy to estimate the actual bitrate. In our experiments, this is very close to the real bitrate (<0.1\% difference for most images). But to evaluate this yourself, you can use
python val.py ../ckpts MODEL_ID DATASET --save_ours --real_bpp

which will use an arithmetic encoder to write the symbols of an image to a file, count the number of bits, and then decode the bits to restore the symbols. We note that this is not optimized at all (images from the Kodak validation set take ~350s to encode and ~200s to decode). For a practical implementation, the following should be done: - A faster arithmetic encoder should be used (we use the clean but non-optimized code from here). - The probability classifier network should output the logits for all symbols in parallel, instead of sequentially. - Decoding should re-use activations, as in Fast PixelCNN++, which achieves speedups of up to 183x. - Like in classical approaches, the image could be split into blocks and those blocks could be encoded in parallel.

Plot

The plot above was created using

python plotter.py ../ckpts 0515_1103,0515_1309,0515_1310 kodak --style mean --ids A B C --latex

For reference, the curve corresponding to our model in Fig. 1 in the paper can be reproduced with the following data:

# bpp -> MS-SSIM on Kodak
CVPR_FIG1 = [
    (0.1265306, 0.9289356),
    (0.1530612, 0.9417454),
    (0.1795918, 0.9497924),
    (0.2061224, 0.9553684),
    (0.2326531, 0.9598574),
    (0.2591837, 0.9636625),
    (0.2857143, 0.9668663),
    (0.3122449, 0.9695684),
    (0.3387755, 0.9718446),
    (0.3653061, 0.9738012),
    (0.3918367, 0.9755308),
    (0.4183673, 0.9770696),
    (0.4448980, 0.9784622),
    (0.4714286, 0.9797252),
    (0.4979592, 0.9808753),
    (0.5244898, 0.9819255),
    (0.5510204, 0.9828875),
    (0.5775510, 0.9837722),
    (0.6040816, 0.9845877),
    (0.6306122, 0.9853407),
    (0.6571429, 0.9860362),
    (0.6836735, 0.9866768),
    (0.7102041, 0.9872690),
    (0.7367347, 0.9878184),
    (0.7632653, 0.9883268),
    (0.7897959, 0.9887977),
    (0.8163265, 0.9892346),
    (0.8428571, 0.9896379)]

Training

If you want to train on the ImageNet dataset as described in the paper, follow the steps below (Prepare ImageNET). After doing this, you can pass

--dataset_train imgnet_train --dataset_test imgnet_test
to
train.py
(make sure you set
$RECORDS_ROOT
for this, see below). Otherwise, set
--dataset_train
and
--dataset_test
to an escaped glob matching images files (e.g.
some/images/\*/\*.jpg
).
python train.py ae_configs/cvpr/AE_CONFIG pc_configs/cvpr/PC_CONFIG \
        --dataset_train TRAIN_DATASET \
        --dataset_test TEST_DATASET \
        --log_dir_root LOG_DIR_ROOT

where

AE_CONFIG
and
PC_CONFIG
are one of the configs in the respective folders. The models in
ckpts
where obtained with the following configs:
  • 0515_1103
    :
    ae_configs/cvpr/low
    pc_configs/cvpr/res_shallow
  • 0515_1309
    :
    ae_configs/cvpr/med
    pc_configs/cvpr/res_shallow
  • 0515_1310
    :
    ae_configs/cvpr/high
    pc_configs/cvpr/res_shallow

Various options are available for

train.py
, such as
--restore
to continue training from a previous checkpoint. See
python train.py -h
.

Prepare ImageNET

The following instructions assume that you have the following tools installed: - GNU parallel (you can do without but it might take a really long time. Installing should be as simple as

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
, see e.g. here) - ImageMagick to downscale images to 256 pixels -
fjcommon
(
pip install fjcommon
) to create TF Records

Note that creating all records will likely take several hours. Note that the following was tesed using

zsh
.

1. Get ImageNET, in the proper format

You need to download

ILSVRC2012_img_train.tar
and
ILSVRC2012_img_val.tar
(a good resource is the Inception
download_imagenet.sh
script
). For the following instructions, we assume both tar files are located in a directory
data
.
    # in data/

mkdir train val

pushd train
tar xvf ../ILSVRC2012_img_train.tar
popd

pushd val
tar xvf ../ILSVRC2012_img_val.tar
popd

This will unpack to 1000 .tar containers into

train/
and 50000 .JPEG images into
val/
. Now, we need to extract the training images. This may take a while depending on your setup.
    # in data/
    pushd train
    find . -name "n*.tar" | parallel -j64 'mkdir -vp {/.} && tar xf {} -C {/.}'
    popd

2. Downsample

We downsample each image to have 256 pixels on the shorter side, by executing the following command in

data/
. Again, this is very time-consuming, so if you have access to some CPU cluster, it might make sense to run it there.
    # in data/
    find . -name "*.JPEG" |  parallel -j64 convert -verbose {} -resize "256x256^" {}

3. Create records

Now it's time to pack the images into TF record files. We will save them in

data/records/
:
    # in data/
    mkdir -p records/train
    mkdir -p records/val

pushd train
find . -name "*.JPEG" | parallel --bar -j64 -N 1250 \
    'OUTPUT_PATH=$(printf "../records/train/train-%05d.tfrecord" {#});' \
    'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'
popd

pushd val
find . -name "*.JPEG" | parallel --bar -j16 -N 1250 \
    'OUTPUT_PATH=$(printf "../records/val/val-%05d.tfrecord" {#});' \
    'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'
popd

4. Set RECORDS_ROOT

Make sure the following environment variable is set before running

train.py
:
    export RECORDS_ROOT=path_to_data/records

Citation

If you use this code for your research, please cite this paper:

@inproceedings{mentzer2018conditional1,
    Author = {Mentzer, Fabian and Agustsson, Eirikur and Tschannen, Michael and Timofte, Radu and Van Gool, Luc},
    Booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    Title = {Conditional Probability Models for Deep Image Compression},
    Year = {2018}}

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.