Need help with LightAutoML?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

sberbank-ai-lab
606 Stars 67 Forks Apache License 2.0 420 Commits 14 Opened issues

Description

LAMA - automatic model creation framework

Services available

!
?

Need anything else?

Contributors list

No Data

LightAutoML - automatic model creation framework

Slack Telegram PyPI - Downloads Read the Docs Tests

LightAutoML (LAMA) - project from Sberbank AI Lab AutoML group is the framework for automatic classification and regression model creation.

Current available tasks to solve: - binary classification - multiclass classification - regression

Currently we work with datasets, where each row is an object with its specific features and target. Multitable datasets and sequences are now under contruction :)

Note: for automatic creation of interpretable models we use

AutoWoE
library made by our group as well.

Authors: Alexander Ryzhkov, Anton Vakhrushev, Dmitry Simakov, Vasilii Bunakov, Rinchin Damdinov, Pavel Shvets, Alexander Kirilin.

Quick tour

Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: * Use ready preset for tabular data: ```python import pandas as pd from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task

dftrain = pd.readcsv('../input/titanic/train.csv') dftest = pd.readcsv('../input/titanic/test.csv')

automl = TabularAutoML( task = Task( name = 'binary', metric = lambda ytrue, ypred: f1score(ytrue, (ypred > 0.5)*1)) ) oofpred = automl.fitpredict( dftrain, roles = {'target': 'Survived', 'drop': ['PassengerId']} ) testpred = automl.predict(dftest)

pd.DataFrame({ 'PassengerId':dftest.PassengerId, 'Survived': (testpred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False) ```

  • Build your own custom pipeline:
import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv') df_test = pd.read_csv('../input/titanic/test.csv')

define that machine learning problem is binary classification

task = Task("binary")

reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE)

create a feature selector

model0 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'seed': 42, 'num_threads': N_THREADS} ) pipe0 = LGBSimpleFeatures() mbie = ModelBasedImportanceEstimator() selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0)

build first level pipeline for AutoML

pipe = LGBSimpleFeatures()

stop after 20 iterations or after 30 seconds

params_tuner1 = OptunaTuner(n_trials=20, timeout=30) model1 = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 128, 'seed': 1, 'num_threads': N_THREADS} ) model2 = BoostLGBM( default_params={'learning_rate': 0.025, 'num_leaves': 64, 'seed': 2, 'num_threads': N_THREADS} ) pipeline_lvl1 = MLPipeline([ (model1, params_tuner1), model2 ], pre_selection=selector, features_pipeline=pipe, post_selection=None)

build second level pipeline for AutoML

pipe1 = LGBSimpleFeatures() model = BoostLGBM( default_params={'learning_rate': 0.05, 'num_leaves': 64, 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, freeze_defaults=True ) pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, post_selection=None)

build AutoML pipeline

automl = AutoML(reader, [ [pipeline_lvl1], [pipeline_lvl2], ], skip_conn=False)

train AutoML and get predictions

oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) test_pred = automl.predict(df_test)

pd.DataFrame({ 'PassengerId':df_test.PassengerId, 'Survived': (test_pred.data[:, 0] > 0.5)*1 }).to_csv('submit.csv', index = False)

LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the resources section.

Reference papers

Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin "LightAutoML: AutoML Solution for a Large Financial Services Ecosystem". arXiv:2109.01528, 2021.

Installation

Installation via pip from PyPI

To install LAMA framework on your machine: ```bash

Installation base functionality:

pip install -U lightautoml

Available partial installation

Use extra dependecies = ['nlp', 'cv', 'report']

Or may use 'all' for installation full functionality, example:

pip install -U lightautoml[nlp]

Additionaly, run following commands for generating report in pdf format:

```bash

MacOS

brew install cairo pango gdk-pixbuf libffi

Debian / Ubuntu

sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info

Fedora

sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2

Windows

follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows

Installation from source code

First of all you need to install git and poetry.

# Load LAMA source code
git clone https://github.com/sberbank-ai-lab/LightAutoML.git

cd LightAutoML/

!!!Choose only one item!!!

1. Global installation: Don't create virtual environment

poetry config virtualenvs.create false --local

2. Recommended: Create virtual environment inside your project directory

poetry config virtualenvs.in-project true

For more information read poetry docs

Install LAMA

poetry lock poetry install

Resources

Important 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default

Important 2: to take a look at this report after the run, please comment last line of demo with report deletion command.

Contributing to LightAutoML

If you are interested in contributing to LightAutoML, please read the Contributing Guide to get started.

Questions / Issues / Suggestions

Seek prompt advice at Slack community or Telegram group.

Open bug reports and feature requests on GitHub issues.

Licence

This project is licensed under the Apache Licence, Version 2.0. See LICENSE file for more details.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.