KDD Cup 2020 Challenges for Modern E-Commerce Platform: Multimodalities Recall first place
No Data
testA.tsvwe find many querys with weird grammar such as be verb to be the first word, which will harm the performance of the contextualized embedding. Hence we filter out such words to make a query more like a sentence.
[0,1]
token_type_ids embeddingswith
[SEP]token to separate the two features.
valid.tsvwe can see that the candidates of a query usually have similar words. We thus give higher sampling probability to the querys sharing same words with the positive query. However, it is not easy to tune the probability distribution. Also, another annoying thing is that the sampling ratio of the similar querys can be neither too high nor too low. Therefore, we come up with an easily-tuned method:
topkmost similar querys sharing most words for each query
topkmost similar querys with the same amount of the number of the query.
topk. And we set
topk = max({numbers of features of querys})*3
valid.tsv
valid.tsv, which is the only ground truth we have. Therefore, we extract the embedding from the trained model to be the new classifier's input, then use
0/1as training target to generate our final prediction.
LightGBMas this new classifier with all default hyperparameters.
valid.tsv
valid.tsvwe can see that the product with less appearances among the
valid.tsvhas higher probability to be the answer. We thus only keep the products which only occur once.
Python==3.8
torch==1.4.0
transformers==2.9.0
gensim==3.8.3
lightgbm==2.3.1
share_master.ipynbbefore run
MCAN-RoBERTa_pair-cat_box_tfidf-neg_focal_all_shared.ipynbbecause our MCAN using shared memory.
Visual-BERT_pair_box_tfidf-neg_focal_all.ipynbcan be run directly.
gpu_idand
n_workersfor LightGBM as below:
python3 MCAN-RoBERTa_pair-cat_box_tfidf-neg_focal_all_predict-all_cls.py {gpu_id} {n_workers}
python3 Visual-BERT_pair_box_tfidf-neg_focal_all_predict-all_cls.py {gpu_id} {n_workers}
./main.sh
n_workers = 24it takes around 5 hours to predict. ### Follow up questions from issues: > 1. Can you give a simple example of the negative sample sampling method?
Let the query pool be
['a cute dog', 'a cute bear', 'korean style of cat', 'japanese little dog', 'whatever it is']and
topk = 4. Then for query
'a cute dog'we have an array of similar word counts:
[3, 2, 0, 1, 0]. After that, we sorted the querys by this array and filter out the target query. So the negative querys of the target query would be
['a cute bear', 'korean style of cat', 'japanese little dog']. Next moving on to sampling image features. Let the target query have
nimage features, then we should sample
n*knegatives, where
kis the negative sampling rate. We simply sampled
nquerys from its negative querys
ktimes, and for each querys we uniformly sampled one image feature. Here we can see that
topkshould at least be the maximum number of numbers of features of querys plus one.
- In this competition, 69 models have been trained based on Mcan and Visual Bert methods. Do these 69 models have any differences, such as parameters, training samples, etc?
Since we had a large negative sampling pool with large
topkparameter, the only differences were random seeds for all random parts, which should be diverse enough.
- Before post-processing, a single model based on Mcan or visual Bert is used to evalute the [email protected] on valid.tsv. How much can be achieved?
For VisualBERT, it was around 0.69. As for MCAN, it was around 0.71.
- In the post-processing stage, the valid set is used to train the model. How to evaluate the model?
K-fold cross-validation on
valid.tsv, and simple blending is applied afterward.
- After post-processing, how many Score can a single model achieve in testA?
There was no enough time for us to test on testA. But it was around 0.87-0.88 on
valid.tsv.
[1] Yu, Zhou, et al. "Deep modular co-attention networks for visual question answering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. [2] Li, Liunian Harold, et al. "Visualbert: A simple and performant baseline for vision and language." arXiv preprint arXiv:1908.03557 (2019).
We appreciate the advice and supports from Prof. Shou-De Lin under grant number 109-2634-F-002-033 from Taiwan Ministry of Science and Technology (MOST) ("Advanced Technologies for Resource-constrained Deep Learning") , Microsoft Research Asia Collaborative Project Funding (2019), and computation resources from National Center for High-Performance Computing.
HOSTKEY is now offering GPU dedicated servers and VPS with NVIDIA GTX 1080/1080Ti and RTX 2080Ti that are capable of processing any task at hand. We offer GPU servers with pre-installed PyTorch, Keras, Theano and TensorFlow libraries for dataflow, machine learning and neural networks.
You can train your neural network using HOSTKEY GPU solution 10x cheaper than on AWS or Google Cloud Platform but with the same speed! Research: https://medium.com/@hostkey/data-science-experts-from-catalyst-have-compared-the-time-and-monetary-investment-in-training-the-cab231bc67d0