sklearn2pmml

by jpmml

jpmml /sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML

469 Stars 89 Forks Last release: Not found GNU Affero General Public License v3.0 269 Commits 118 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

SkLearn2PMML

Python library for converting Scikit-Learn pipelines to PMML.

Features

This library is a thin wrapper around the JPMML-SkLearn command-line application.

For a list of supported Estimator and Transformer types, please refer to JPMML-SkLearn features.

Prerequisites

  • Python 2.7, 3.4 or newer.
  • Java 1.8 or newer. The Java executable must be available on system path.

Installation

Installing a release version from PyPI:

pip install sklearn2pmml

Alternatively, installing the latest snapshot version from GitHub:

pip install --upgrade git+https://github.com/jpmml/sklearn2pmml.git

Usage

A typical workflow can be summarized as follows:

  1. Create a
    PMMLPipeline
    object, and populate it with pipeline steps as usual. Class
    sklearn2pmml.pipeline.PMMLPipeline
    extends class
    sklearn.pipeline.Pipeline
    with the following functionality:
    • If the
      PMMLPipeline.fit(X, y)
      method is invoked with
      pandas.DataFrame
      or
      pandas.Series
      object as an
      X
      argument, then its column names are used as feature names. Otherwise, feature names default to "x1", "x2", .., "x{numberoffeatures}".
    • If the
      PMMLPipeline.fit(X, y)
      method is invoked with
      pandas.Series
      object as an
      y
      argument, then its name is used as the target name (for supervised models). Otherwise, the target name defaults to "y".
  2. Fit and validate the pipeline as usual.
  3. Optionally, compute and embed verification data into the
    PMMLPipeline
    object by invoking
    PMMLPipeline.verify(X)
    method with a small but representative subset of training data.
  4. Convert the
    PMMLPipeline
    object to a PMML file in local filesystem by invoking utility method
    sklearn2pmml.sklearn2pmml(pipeline, pmml_destination_path)
    .

Developing a simple decision tree model for the classification of iris species:

import pandas

iris_df = pandas.read_csv("Iris.csv")

from sklearn.tree import DecisionTreeClassifier from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([ ("classifier", DecisionTreeClassifier()) ]) pipeline.fit(iris_df[iris_df.columns.difference(["Species"])], iris_df["Species"])

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "DecisionTreeIris.pmml", with_repr = True)

Developing a more elaborate logistic regression model for the same:

import pandas

iris_df = pandas.read_csv("Iris.csv")

from sklearn_pandas import DataFrameMapper from sklearn.decomposition import PCA from sklearn.feature_selection import SelectKBest from sklearn.preprocessing import Imputer from sklearn.linear_model import LogisticRegression from sklearn2pmml.decoration import ContinuousDomain from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([ ("mapper", DataFrameMapper([ (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), Imputer()]) ])), ("pca", PCA(n_components = 3)), ("selector", SelectKBest(k = 2)), ("classifier", LogisticRegression()) ]) pipeline.fit(iris_df, iris_df["Species"])

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "LogisticRegressionIris.pmml", with_repr = True)

Documentation

Up-to-date:

Slightly outdated:

De-installation

Uninstalling:

pip uninstall sklearn2pmml

License

SkLearn2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use SkLearn2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes SkLearn2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

SkLearn2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact [email protected]

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.