Need help with kaggle-hpa?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

215 Stars 72 Forks BSD 2-Clause "Simplified" License 2 Commits 2 Opened issues


Code for 3rd place solution in Kaggle Human Protein Atlas Image Classification Challenge.

Services available


Need anything else?

Contributors list


Code for 3rd place solution in Kaggle Human Protein Atlas Image Classification Challenge.

To read the detailed solution, please, refer to the Kaggle post


The following specs were used to create the original solution. - Ubuntu 16.04 LTS - Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz - 3x NVIDIA TitanX

Reproducing Submission

To reproduct my submission without retrainig, do the following steps: 1. Installation 2. Download Official Image 3. Make RGBY Images for official. 4. Download Pretrained models 5. Inference 6. Make Submission


All requirements should be detailed in requirements.txt. Using Anaconda is strongly recommended.

conda create -n hpa python=3.6
source activate hpa
pip install -r requirements.txt

Dataset Preparation

All required files except images are already in data directory. If you generate CSV files (duplicate image list, split, leak.. ), original files are overwritten. The contents will be changed, but It's not a problem.

Prepare Images

After downloading and converting images, the data directory is structured as:

  +- raw
  |  +- train
  |  +- test
  |  +- external
  +- rgby
  |  +- train
  |  +- test
  |  +- external

Download Official Image

Download and extract and to data/raw directory. If the Kaggle API is installed, run following command.

$ kaggle competitions download -c human-protein-atlas-image-classification -f
$ kaggle competitions download -c human-protein-atlas-image-classification -f
$ mkdir -p data/raw
$ unzip -d data/raw/train
$ unzip -d data/raw/test

Download External Images

To download external images, run following command. The external images will be located in data/raw/external

$ python tools/

Make RGBY Images

To train or inference, converting to RGBY image is required. Run following commands.

For official:

$ python tools/ --input_dir=data/raw/train --output_dir=data/rgby/train
$ python tools/ --input_dir=data/raw/test --output_dir=data/rgby/test
For external:
$ python tools/ --input_dir=data/raw/external --output_dir=data/rgby/external

Generate CSV files

You can skip this step. All CSV files are prepared in data directory.

Duplicated Image List

There are duplicated images. To search them, run following commands. duplicates.ahash.csv and duplicates.phash.csv will be generated.

$ python tools/

Split Dataset

Create 5 folds CV set. One for training, the other for searching augmentation. split.stratified.[0-4].csv and split.stratified.small.[0-4].csv will be generated.

$ python
$ python --use_external=0

Search Data Leak

To learn more about data leak, please, refer to this post. Following comand will create data_leak.ahash.csv and data_leak.phash.csv. The other leak is already in data directory.

$ python


In configs directory, you can find configurations I used train my final models. My final submission is ensemble of resnet34 x 5, inception-v3 and se-resnext50, but ensemble of inception-v3 and se-resnext50's performance is better.

Search augmentation

To find suitable augmentation, 256x256 image and resnet18 are used. It takes about 2 days on TitanX. The result( will be located in results/search directory. The policy that I used is located in data directory.

$ python --config=configs/search.yml

Train models

To train models, run following commands.

$ python --config={config_path}
To train all models, run

The expected training times are:


GPUs Image size Training Epochs Training Time
resnet34 1x TitanX 512 40 16 hours
inception-v3 3x TitanX 1024 27 1day 15 hours
se-resnext50 2x TitanX 1024 22 2days 15 hours

Average weights

To average weights, run following commands.

$ python --config={config_path}
To average weights of all models, simply run
The averages weights will be located in results/{train_dir}/checkpoint.

Pretrained models

You can download pretrained model that used for my submission from link. Or run following command.

$ wget
$ tar xzvf results.tar.gz
Unzip them into results then you can see following structure:
  +- resnet34.0.policy
  |  +- checkpoint
  +- resnet34.1.policy
  |  +- checkpoint
  +- resnet34.2.policy
  |  +- checkpoint
  +- resnet34.3.policy
  |  +- checkpoint
  +- resnet34.4.policy
  |  +- checkpoint
  +- inceptionv3.attention.policy.per_image_norm.1024
  |  +- checkpoint
  +- se_resnext50.attention.policy.per_image_norm.1024
  |  +- checkpoint


If trained weights are prepared, you can create files that contains class probabilities of images.

$ python \
  --config={config_filepath} \
  --num_tta={number_of_tta_images, 4 or 8} \
  --output={output_filepath} \
  --split={test or test_val}
To make submission, you must inference test and testval splits. For example: ``` $ python --config=configs/resnet34.0.policy.yml --numtta=8 --output=inferences/resnet34.0.testval.csv --split=testval $ python --config=configs/resnet34.0.policy.yml --num_tta=8 --output=inferences/resnet34.0.test.csv --split=test ``
To inference all models, simply run

Make Submission

Following command will ensemble of all models and make submissions.

$ python
If you don't want to use, modify For example, if you want to use inception-v3 and se-resnext50 then modify testvalfilenames, test_filenames and weights in ``` testvalfilenames = ['inferences/inceptionv3.0.testval.csv', 'inferences/seresnext50.0.test_val.csv']

testfilenames = ['inferences/inceptionv3.0.test.csv', 'inferences/seresnext50.0.test.csv']

weights = [1.0, 1.0] ``` The command generate two files. One for original submission and the other is modified using data leak. - submissions/submission.csv - submissions/submission.csv.leak.csv

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.