Kaggle Humpback Whale Identification Challenge 2019 2nd place code
This is the source code for my part of the 2nd place solution to the Humpback Whale Identification Challenge hosted by Kaggle.com.
2019.03.13: code upload.
| single model | privare LB| | ---------------- | ---- | |resnet101fold0256x512|0.9696| |seresnet101fold0256x512|0.9691| |seresnext101fold0256x512|0.9692| |resnet101fold0512x512|0.9682| |seresnet101fold0512x512|0.9664| |seresnext101fold0512x512|-|
I generate a pseudo label list containing 1.5k samples when I reached 0.940 in public LB, and I kept using this list till the competition ended. I used the bottleneck feature of the arcface model (my baseline model) to calculate cosine distance of train test images. For those few shot classes (less than 2 samples), I choose 0.65 as the threshold to filter high confidence samples. I think it will be better result using 0.970 LB model to find pseudo label.
| single model | privare LB| | ---------------- | ---- | |resnet101fold0256x512|0.9705| |seresnet101fold0256x512|0.9704| |seresnext101fold0256x512|-|
| single model | privare LB| | ---------------- | ---- | |resnet101seresnet101seresnext101fold0256x512|0.97113| |resnet101seresnet101seresnext101fold0512x512_pseudo|0.97072| |10 models(final submisson)|0.97209|
Set the following path to your own in ./process/datahelper.py ``` PJDIR = r'/KaggleWhale20192ndplacesolution'#project path traindf = pd.readcsv('train.csv') #train.csv path TRNIMGSDIR = '/train/'#train data path TSTIMGSDIR = '/test/' #test data path ```
train resnet101 256x512 fold 0:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --mode=train --model=resnet101 --image_h=256 --image_w=512 --fold_index=0 --batch_size=128
train resnet101 512x512 fold 0:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --mode=train --model=resnet101 --image_h=512 --image_w=512 --fold_index=0 --batch_size=128
predict resnet101 256x512 fold 0 model:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --mode=test --model=resnet101 --image_h=256 --image_w=512 --fold_index=0 --batch_size=128 --pretrained_mode=max_valid_model.pth
train resnet101 256x512 fold 0 with pseudo labeling:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --mode=train --model=resnet101 --image_h=256 --image_w=512 --fold_index=0 --batch_size=128 --is_pseudo=True
predict resnet101 256x512 fold 0 model with pseudo labeling:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --mode=test --model=resnet101 --image_h=256 --image_w=512 --fold_index=0 --batch_size=128 --is_pseudo=True --pretrained_mode=max_valid_model.pth
the final submission is the weight average result of 10 ckpts
python ensemble.py