Need help with popmon?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

ing-bank
122 Stars 14 Forks MIT License 136 Commits 1 Opened issues

Description

Monitor the stability of a pandas or spark dataframe ⚙︎

Services available

!
?

Need anything else?

Contributors list

===========================

Population Shift Monitoring

|build| |docs| |release| |release_date| |downloads|

|logo|

popmon
is a package that allows one to check the stability of a dataset.
popmon
works with both pandas and spark datasets.

popmon
creates histograms of features binned in time-slices, and compares the stability of the profiles and distributions of those histograms using statistical tests, both over time and with respect to a reference. It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two features.
popmon
can automatically flag and alert on changes observed over time, such as trends, shifts, peaks, outliers, anomalies, changing correlations, etc, using monitoring business rules.

|example|

Announcements

Spark 3.0

With Spark 3.0, based on Scala 2.12, make sure to pick up the correct

histogrammar
jar files:

.. code-block:: python

spark = SparkSession.builder.config("spark.jars.packages", "io.github.histogrammar:histogrammar2.12:1.0.11,io.github.histogrammar:histogrammar-sparksql2.12:1.0.11").getOrCreate()

For Spark 2.X compiled against scala 2.11, in the string above simply replace 2.12 with 2.11.

January 29, 2021

Documentation

The entire

popmon
documentation including tutorials can be found at
read-the-docs 
_.

Examples

  • Flight Delays and Cancellations Kaggle data 
    _
  • Synthetic data (code example below) 
    _

Notebooks

.. list-table:: :widths: 80 20 :header-rows: 1

  • - Tutorial
    • Colab link
  • -
    Basic tutorial 
    _
    • |notebookbasiccolab|
  • -
    Detailed example (featuring configuration, Apache Spark and more) 
    _
    • |notebookadvancedcolab|
  • -
    Incremental datasets (online analysis) 
    _
    • |notebookincrementaldata_colab|

Check it out

The

popmon
library requires Python 3.6+ and is pip friendly. To get started, simply do:

.. code-block:: bash

$ pip install popmon

or check out the code from our GitHub repository:

.. code-block:: bash

$ git clone https://github.com/ing-bank/popmon.git $ pip install -e popmon

where in this example the code is installed in edit mode (option -e).

You can now use the package in Python with:

.. code-block:: python

import popmon

Congratulations, you are now ready to use the popmon library!

Quick run

As a quick example, you can do:

.. code-block:: python

import pandas as pd import popmon from popmon import resources

# open synthetic data df = pd.readcsv(resources.data('test.csv.gz'), parsedates=['date']) df.head()

# generate stability report using automatic binning of all encountered features # (importing popmon automatically adds this functionality to a dataframe) report = df.pmstabilityreport(time_axis='date', features=['date:age', 'date:gender'])

# to show the output of the report in a Jupyter notebook you can simply run: report

# or save the report to file report.tofile("monitoringreport.html")

To specify your own binning specifications and features you want to report on, you do:

.. code-block:: python

# time-axis specifications alone; all other features are auto-binned. report = df.pmstabilityreport(timeaxis='date', timewidth='1w', time_offset='2020-1-6')

# histogram selections. Here 'date' is the first axis of each histogram. features=[ 'date:isActive', 'date:age', 'date:eyeColor', 'date:gender', 'date:latitude', 'date:longitude', 'date:isActive:age' ]

# Specify your own binning specifications for individual features or combinations thereof. # This bin specification uses open-ended ("sparse") histograms; unspecified features get # auto-binned. The time-axis binning, when specified here, needs to be in nanoseconds. binspecs={ 'longitude': {'binwidth': 5.0, 'binoffset': 0.0}, 'latitude': {'binwidth': 5.0, 'binoffset': 0.0}, 'age': {'binwidth': 10.0, 'binoffset': 0.0}, 'date': {'binwidth': pd.Timedelta('4w').value, 'bin_offset': pd.Timestamp('2015-1-1').value} }

# generate stability report report = df.pmstabilityreport(features=features, binspecs=binspecs, time_axis=True)

These examples also work with spark dataframes. You can see the output of such example notebook code

here 
. For all available examples, please see the
tutorials 
at read-the-docs.

Resources

Presentations

+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+------------------+-------------------------+ | Title | Host | Date | Speaker | +------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+------------------+-------------------------+ | Popmon - population monitoring made easy |

Data Lunch @ Eneco 
_ | October 29, 2020 | Max Baak, Simon Brugman | +------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+------------------+-------------------------+ | Popmon - population monitoring made easy |
Data Science Summit 2020 
_ | October 16, 2020 | Max Baak | +------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+------------------+-------------------------+ |
Population Shift Monitoring Made Easy: the popmon package 
_ |
Online Data Science Meetup @ ING WBAA 
_ | July 8 2020 | Tomas Sostak | +------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+------------------+-------------------------+ |
Popmon: Population Shift Monitoring Made Easy 
_ |
PyData Fest Amsterdam 2020 
_ | June 16, 2020 | Tomas Sostak | +------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+------------------+-------------------------+ | Popmon: Population Shift Monitoring Made Easy |
Amundsen Community Meetup 
_ | June 4, 2020 | Max Baak | +------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+------------------+-------------------------+

Articles

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+----------------+ | Title | Date | Author | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+----------------+ |

Popmon Open Source Package — Population Shift Monitoring Made Easy 
_ | May 20, 2020 | Nicole Mpozika | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+----------------+

Project contributors

This package was authored by ING Wholesale Banking Advanced Analytics. Special thanks to the following people who have contributed to the development of this package:

Ahmet Erdem 
,
Fabian Jansen 
,
Nanne Aben 
_, Mathieu Grimal.

Contact and support

  • Issues & Ideas & Support: https://github.com/ing-bank/popmon/issues

Please note that ING WBAA provides support only on a best-effort basis.

License

Copyright ING WBAA.

popmon
is completely free, open-source and licensed under the
MIT license 
_.

.. |logo| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/popmon-logo.png :alt: POPMON logo :target: https://github.com/ing-bank/popmon .. |example| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/trafficlightoverview.png :alt: Traffic Light Overview .. |build| image:: https://github.com/ing-bank/popmon/workflows/build/badge.svg :alt: Build status .. |docs| image:: https://readthedocs.org/projects/popmon/badge/?version=latest :alt: Package docs status :target: https://popmon.readthedocs.io .. |release| image:: https://img.shields.io/github/v/release/ing-bank/popmon :alt: Latest GitHub release :target: https://github.com/ing-bank/popmon/releases .. |release_date| image:: https://img.shields.io/github/release-date/ing-bank/popmon :alt: GitHub Release Date :target: https://github.com/ing-bank/popmon/releases

.. |notebookbasiccolab| image:: https://colab.research.google.com/assets/colab-badge.svg :alt: Open in Colab :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmontutorialbasic.ipynb .. |notebookadvancedcolab| image:: https://colab.research.google.com/assets/colab-badge.svg :alt: Open in Colab :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmontutorialadvanced.ipynb .. |notebookincrementaldatacolab| image:: https://colab.research.google.com/assets/colab-badge.svg :alt: Open in Colab :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmontutorialincrementaldata.ipynb .. |downloads| image:: https://pepy.tech/badge/popmon :alt: PyPi downloads :target: https://pepy.tech/project/popmon

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.