awesome-ensemble-learning

by yzhao062

Ensemble learning related books, papers, videos, and toolboxes

164 Stars 36 Forks Last release: Not found MIT License 20 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Awesome Ensemble Learning

.. image:: https://img.shields.io/github/stars/yzhao062/awesome-ensemble-learning.svg :target: https://github.com/yzhao062/awesome-ensemble-learning/stargazers :alt: GitHub stars

.. image:: https://img.shields.io/github/forks/yzhao062/awesome-ensemble-learning.svg?color=blue :target: https://github.com/yzhao062/awesome-ensemble-learning/network :alt: GitHub forks

.. image:: https://img.shields.io/github/license/yzhao062/awesome-ensemble-learning.svg?color=blue :target: https://github.com/yzhao062/awesome-ensemble-learning/blob/master/LICENSE :alt: License

.. image:: https://awesome.re/badge-flat2.svg :target: https://awesome.re/badge-flat2.svg :alt: Awesome


Ensemble Learning 
_ (also known as Ensembling) is an exciting yet challenging field. Ensembling leverages multiple base models to achieve better predictive performance, which is often better than any of the constituent models alone [#Opitz1999Popular]. It has been proven critical in many practical applications and data science competitions [#Bell2007Lessons], e.g., Kaggle.

To promote the learning of ensembling, we create this repository with:

. Books & Academic Papers

. Online Courses and Videos

. Open-source and Commercial Libraries/Toolboxes and Datasets

. Key Conferences & Journals

More items will be added to the repository. Please feel free to suggest other key resources by opening an issue report, submitting a pull request, or dropping me an email @ ([email protected]). Enjoy reading!


Table of Contents

  • 1. Books & Tutorials 
    _
    • 1.1. Books 
      _
    • 1.2. Tutorials 
      _
  • 2. Courses/Seminars/Videos 
    _
  • 3. Toolboxes & Datasets 
    _
    • 3.1. Toolboxes 
      _
    • 3.2. Datasets 
      _
  • 4. Papers 
    _
    • 4.1. Overview & Survey Papers 
      _
    • 4.2. Key Algorithms 
      _
    • 4.3. Boosting 
      _
    • 4.4. Clustering Ensemble 
      _
    • 4.5. Outlier Ensemble 
      _
    • 4.6. Ensemble Learning for Data Stream 
      _
  • 5. Key Conferences/Workshops/Journals 
    _
    • 5.1. Conferences & Workshops 
      _
    • 5.2. Journals 
      _

1. Books & Tutorials

1.1. Books ^^^^^^^^^^

Ensemble Methods: Foundations and Algorithms 
_ by Zhi-Hua Zhou [#Zhou2012Ensemble]: Classical text book covering most of the ensemble learning techniques. A must-read for people in the field.
[Full Book] 

Ensemble Machine Learning: Methods and Applications 
_ edited by Oleg Okun [#Zhang2012Ensemble]_: Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs.

Applications of Supervised and Unsupervised Ensemble Methods 
_ edited by Oleg Okun [#Okun2009Applications]_: This book contains the extended papers presented at the 2nd Workshop on Supervised and Unsupervised Ensemble Methods and their Applications (SUEMA), in conjunction with ECAI’2008.

Data Mining and Knowledge Discovery Handbook 
_ Chapter 45 (Ensemble Methods for Classifiers): by Lior Rokach [#Rokach2005Ensemble]_: This chapter provides an overview of ensemble methods in classification tasks. We present all important types of ensemble method including boosting and bagging. Combining methods and modeling issues such as ensemble diversity and ensemble size are discussed.

Outlier Ensembles: An Introduction 
_ by Charu Aggarwal and Saket Sathe [#Aggarwal2017Outlier]_: Great intro book for ensemble learning in outlier analysis.

1.2. Tutorials ^^^^^^^^^^^^^^

=============================================================================== ============================================ ===== ============================ ========================================================================================================================================================================== Tutorial Title Venue Year Ref Materials =============================================================================== ============================================ ===== ============================ ========================================================================================================================================================================== On the Power of Ensemble: Supervised and Unsupervised Methods Reconciled SDM 2010 [#Gao2010On]_

[HTML] 
_ =============================================================================== ============================================ ===== ============================ ==========================================================================================================================================================================

2. Courses/Seminars/Videos

Coursera - How to Win a Data Science Competition: Learn from Top Kagglers\ :

  • Ensembling (92 mins) 
    _

Coursera - Machine Learning: Classification by University of Washington partly covers the topic\ :

  • Ensemble classifiers 
    _
  • Ensembles, Bagging, Boosting 
    _

Machine Learning and Data Mining by

Prof. Alexander Ihler 
: `Section on ensembling (4 videos) <https://www.youtube.com/watch?v=Yvn3--rIdZg&list=PLaXDtXvwY-oDvedS3f4HW0b4KxqpJimw&index=27>`_.

3. Toolboxes & Datasets

3.1. Toolboxes ^^^^^^^^^^^^^^

[Python]

combo 
\ : combo is a comprehensive Python toolbox for combining machine learning (ML) models and scores for various tasks, including classification, clustering, and anomaly detection. It supports the combination of ML models from core libraries such as scikit-learn and xgboost (
documentation 
).

[Python]

pycobra 
_\ : python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.

[Python]

DESlib 
_\ : A Python library for dynamic classifier and ensemble selection.

[Python]

imbalanced-learn 
\ : A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning (
documentation 
).

3.2. Datasets ^^^^^^^^^^^^^

As a subfield of machine learning, ensemble learning is usually tested against general machine learning benchmark datasets. Some helpful links can be found below:

  • List of datasets for machine-learning research - Wikipedia 
    _
  • UCI Machine Learning Repository 
    _
  • PMLB: a large benchmark suite for machine learning evaluation and comparison 
    _ [#Olson2017PMLB]:
    Dataset Repository 

4. Papers

4.1. Overview & Survey Papers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

================================================================================================= =============================== ===== ============================ ========================================================================================================================================================================== Paper Title Venue Year Ref Materials ================================================================================================= =============================== ===== ============================ ========================================================================================================================================================================== Ensemble methods in machine learning MCS 2000 [#Dietterich2000Ensemble]_

[PDF] 
_ Popular ensemble methods: An empirical study JAIR 1999 [#Opitz1999Popular]_
[PDF] 
_ Ensemble learning: A survey Wiley Interdisciplinary Reviews 2018 [#Sagi2018Ensemble]_
[PDF] 
_ ================================================================================================= =============================== ===== ============================ ==========================================================================================================================================================================

4.2. Key Algorithms ^^^^^^^^^^^^^^^^^^^

==================== ================================================================================================= ================================= ===== =========================== ============================================================================================================================================================================================== Abbreviation Paper Title Venue Year Ref Materials ==================== ================================================================================================= ================================= ===== =========================== ============================================================================================================================================================================================== Bagging Bagging predictors Machine Learning 1996 [#Breiman1996Bagging]_

[PDF] 
_ Boosting A decision-theoretic generalization of on-line learning and an application to boosting JCSS 1997 [#Freund1997A]_
[PDF] 
_ N/A Bagging, Boosting, and C4.5 AAAI/IAAI 1996 [#Quinlan1996Bagging]_
[PDF] 
_ Stacking Stacked generalization Neural Networks 1992 [#Wolpert1992Stacked]_
[PDF] 
_ Stacking Stacked regressions Machine Learning 1996 [#Breiman1996Stacked]_
[PDF] 
_ ==================== ================================================================================================= ================================= ===== =========================== ==============================================================================================================================================================================================

4.3. Boosting ^^^^^^^^^^^^^

================================================================================================= ============================ ===== ============================== ========================================================================================================================================================================== Paper Title Venue Year Ref Materials ================================================================================================= ============================ ===== ============================== ========================================================================================================================================================================== Xgboost: A scalable tree boosting system KDD 2016 [#Chen2016Xgboost]_

[PDF] 
_ Lightgbm: A highly efficient gradient boosting decision tree NIPS 2017 [#Ke2017Lightgbm]_
[PDF] 
_ CatBoost: unbiased boosting with categorical features NIPS 2018 [#Prokhorenkova2018CatBoost]_
[PDF] 
_ ================================================================================================= ============================ ===== ============================== ==========================================================================================================================================================================

4.4. Clustering Ensemble ^^^^^^^^^^^^^^^^^^^^^^^^

================================================================================================= ============================ ===== ============================ ========================================================================================================================================================================== Paper Title Venue Year Ref Materials ================================================================================================= ============================ ===== ============================ ========================================================================================================================================================================== Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions JMLR 2002 [#Strehl2002Cluster]_

[PDF] 
_ Clusterer Ensemble KBS 2006 [#Zhou2006Clusterer]_
[PDF] 
_ A survey of clustering ensemble algorithms IJPRAI 2011 [#VegaPons2011A]_
[PDF] 
_ Clustering ensemble method Cybernetics 2019 [#Alqurashi2019Clustering]_
[PDF] 
_ ================================================================================================= ============================ ===== ============================ ==========================================================================================================================================================================

4.5. Outlier Ensemble ^^^^^^^^^^^^^^^^^^^^^

================================================================================================= ============================ ===== ============================ ========================================================================================================================================================================== Paper Title Venue Year Ref Materials ================================================================================================= ============================ ===== ============================ ========================================================================================================================================================================== Outlier ensembles: position paper SIGKDD Explorations 2013 [#Aggarwal2013Outlier]_

[PDF] 
_ Ensembles for unsupervised outlier detection: challenges and research questions a position paper SIGKDD Explorations 2014 [#Zimek2014Ensembles]_
[PDF] 
_ Isolation forest ICDM 2008 [#Liu2008Isolation]_
[PDF] 
_ Outlier detection with autoencoder ensembles SDM 2017 [#Chen2017Outlier]_
[PDF] 
_ An Unsupervised Boosting Strategy for Outlier Detection Ensembles PAKDD 2018 [#Campos2018An]_
[HTML] 
_ LSCP: Locally selective combination in parallel outlier ensembles SDM 2019 [#Zhao2019LSCP]_
[PDF] 
_ ================================================================================================= ============================ ===== ============================ ==========================================================================================================================================================================

4.6. Ensemble Learning for Data Stream ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

================================================================================================= ============================ ===== ============================ ========================================================================================================================================================================== Paper Title Venue Year Ref Materials ================================================================================================= ============================ ===== ============================ ========================================================================================================================================================================== A survey on ensemble learning for data stream classification ACM Computing Surveys 2017 [#Gomes2017A]_

[PDF] 
_ Ensemble learning for data stream analysis: A survey Information Fusion 2017 [#Krawczyk2017Ensemble]_
[PDF] 
_ ================================================================================================= ============================ ===== ============================ ==========================================================================================================================================================================

5. Key Conferences/Workshops/Journals

5.1. Conferences & Workshops ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Key data mining conference deadlines, historical acceptance rates, and more can be found

data-mining-conferences 
_.

ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 
_

ACM International Conference on Management of Data (SIGMOD) 
_

The Web Conference (WWW) 
_

IEEE International Conference on Data Mining (ICDM) 
_

SIAM International Conference on Data Mining (SDM) 
_

IEEE International Conference on Data Engineering (ICDE) 
_

ACM InternationalConference on Information and Knowledge Management (CIKM) 
_

ACM International Conference on Web Search and Data Mining (WSDM) 
_

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 
_

The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 
_

5.2. Journals ^^^^^^^^^^^^^

ACM Transactions on Knowledge Discovery from Data (TKDD) 
_

IEEE Transactions on Knowledge and Data Engineering (TKDE) 
_

ACM SIGKDD Explorations Newsletter 
_

Data Mining and Knowledge Discovery 
_

Knowledge and Information Systems (KAIS) 
_

References

.. [#Aggarwal2013Outlier] Aggarwal, C.C., 2013. Outlier ensembles: position paper. ACM SIGKDD Explorations Newsletter\ , 14(2), pp.49-58.

.. [#Aggarwal2017Outlier] Aggarwal, C.C. and Sathe, S., 2017. Outlier ensembles: An introduction. Springer.

.. [#Alqurashi2019Clustering] Alqurashi, T. and Wang, W., 2019. Clustering ensemble method. International Journal of Machine Learning and Cybernetics, 10(6), pp.1227-1246.

.. [#Bell2007Lessons] Bell, R.M. and Koren, Y., 2007. Lessons from the Netflix prize challenge. SIGKDD Explorations, 9(2), pp.75-79.

.. [#Breiman1996Bagging] Breiman, L., 1996. Bagging predictors. Machine learning, 24(2), pp.123-140.

.. [#Breiman1996Stacked] Breiman, L., 1996. Stacked regressions. Machine learning, 24(1), pp.49-64.

.. [#Campos2018An] Campos, G.O., Zimek, A. and Meira, W., 2018, June. An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 564-576). Springer, Cham.

.. [#Chen2016Xgboost] Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). ACM.

.. [#Chen2017Outlier] Chen, J., Sathe, S., Aggarwal, C. and Turaga, D., 2017, June. Outlier detection with autoencoder ensembles. SIAM International Conference on Data Mining, pp. 90-98. Society for Industrial and Applied Mathematics.

.. [#Dietterich2000Ensemble] Dietterich, T.G., 2000, June. Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Springer, Berlin, Heidelberg.

.. [#Freund1997A] Freund, Y. and Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), pp.119-139.

.. [#Gao2010On] Gao, J., Fan, W. and Han, J., 2010. On the power of ensemble: Supervised and unsupervised methods reconciled. In Tutorial on SIAM Data Mining Conference (SDM), Columbus, OH.

.. [#Gomes2017A] Gomes, H.M., Barddal, J.P., Enembreck, F. and Bifet, A., 2017. A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), p.23.

.. [#Ke2017Lightgbm] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y., 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (pp. 3146-3154).

.. [#Krawczyk2017Ensemble] Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J. and Woźniak, M., 2017. Ensemble learning for data stream analysis: A survey. Information Fusion, 37, pp.132-156.

.. [#Liu2008Isolation] Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining\ , pp. 413-422. IEEE.

.. [#Okun2009Applications] Okun, O. ed., 2009. Applications of supervised and unsupervised ensemble methods (Vol. 245). Springer.

.. [#Olson2017PMLB] Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J. and Moore, J.H., 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining, 10(1), p.36.

.. [#Opitz1999Popular] Opitz, D. and Maclin, R., 1999. Popular ensemble methods: An empirical study. Journal of artificial intelligence research, 11, pp.169-198.

.. [#Quinlan1996Bagging] Quinlan, J.R., 1996, August. Bagging, boosting, and C4.5. In AAAI/IAAI, Vol. 1 (pp. 725-730).

.. [#Prokhorenkova2018CatBoost] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V. and Gulin, A., 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (pp. 6638-6648).

.. [#Rokach2005Ensemble] Rokach L. (2005) Ensemble Methods for Classifiers. In: Maimon O., Rokach L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA

.. [#Sagi2018Ensemble] Sagi, O. and Rokach, L., 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), p.e1249.

.. [#Strehl2002Cluster] Strehl, A. and Ghosh, J., 2002. Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec), pp.583-617.

.. [#VegaPons2011A] Vega-Pons, S. and Ruiz-Shulcloper, J., 2011. A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03), pp.337-372.

.. [#Wolpert1992Stacked] Wolpert, D.H., 1992. Stacked generalization. Neural networks, 5(2), pp.241-259.

.. [#Zhao2019LSCP] Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM), pp. 585-593. Society for Industrial and Applied Mathematics.

.. [#Zhang2012Ensemble] Zhang, C. and Ma, Y. eds., 2012. Ensemble machine learning: methods and applications. Springer Science & Business Media.

.. [#Zhou2006Clusterer] Zhou, Z.H. and Tang, W., 2006. Clusterer ensemble. Knowledge-Based Systems, 19(1), pp.77-83.

.. [#Zhou2012Ensemble] Zhou, Z.H., 2012. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC.

.. [#Zimek2014Ensembles] Zimek, A., Campello, R.J. and Sander, J., 2014. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explorations Newsletter\ , 15(1), pp.11-22.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.