ensemble_amazon

by kaz-Anova

kaz-Anova / ensemble_amazon

Code to share different ensemble techniques with focus on meta-stacking , using data from Amazon.com...

209 Stars 72 Forks Last release: Not found Apache License 2.0 20 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

ensemble_amazon

Code to share different ensemble techniques with focus on meta-stacking , using data from Amazon.com - Employee Access Challenge kaggle competition

This code is part of the EE381V Large-Scale Machine Learning PhD level course in the University of Texas (Taught by Alexandros G. Dimakis) and aims to show different ensemble techniques for AUC type of problems (classification).

The code is for education purposes and did not aim to achieve a high score.

Requirements

  • Python 2.7
  • Xgboost
  • Sklearn
  • numpy
  • scipy
  • pandas

download the train.csv and test.csv data from the kaggle competition : Amazon.com - Employee Access Challenge Link: https://www.kaggle.com/c/amazon-employee-access-challenge

The ensemble methods

  • The code initially creates a couple of models on different transformations of the data and saves the out-of-fold predictions
  • We start testing different ensemble techniques as:
    • Simple average
    • Weighted average based on cv
    • Weighted Rank Average based on cv
    • Geomean Weighted Rank Average based on cv
    • Use another model (ExtraTreesClassifier from sklearn) to perform meta-stacking

Replicate solution

Inisde a folder that the train.csv and test.csv are present :

  • Run amazonmainxgboostcount2D.py
  • Run amazonmainlogit3waybest.py
  • Run amazonmainlogit_2D.py
  • Run amazonmainxgboost.py
  • Run amazonmainlogit_3way.py
  • Run amazonmainxgboost_count.py
  • Run amazonmainxgboostcount3D.py

This will yield the following results in Kaggle's Private Leaderboard and internal 5-fold cv

Model name

AUC - Private LB AUC- CV 5-fold
mainxgboost 0.89096 0.876971
amazonmainlogit2D 0.89534 0.877267
mainlogit3way 0.89554 0.878507
mainlogit3waybest 0.89792 0.882932
mainxgbooscount 0.88187 0.870671
mainxgbooscount2D 0.90127 0.888981
mainxgbooscount_3D 0.904 0.893425

  • Run AUC_Average.py
  • Run AUCWeightedAverage.py
  • Run AUCRankWeighted_Average.py
  • Run AUCGeoRankWeightedAverage.py
  • Run amazon_stacking.py

This will yield:

Model name

AUC - Private LB AUC- CV 5-fold
AUCAverage 0.90725 0.893209
AUCWeightedAverage 0.91121 0.899529
AUCRankWeightedAverage 0.90916 0.897925
AUCGeoRankWeightedAverage 0.90988 0.898586
amazon_stacking 0.91206 0.899851

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.