Github url

luigi

by spotify

spotify /luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency...

13.5K Stars 2.1K Forks Last release: about 1 month ago (3.0.0) Apache License 2.0 3.9K Commits 57 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

.. figure:: https://raw.githubusercontent.com/spotify/luigi/master/doc/luigi.png :alt: Luigi Logo :align: center

.. image:: https://img.shields.io/travis/spotify/luigi/master.svg?style=flat :target: https://travis-ci.org/spotify/luigi

.. image:: https://img.shields.io/codecov/c/github/spotify/luigi/master.svg?style=flat :target: https://codecov.io/gh/spotify/luigi?branch=master

.. image:: https://landscape.io/github/spotify/luigi/master/landscape.svg?style=flat :target: https://landscape.io/github/spotify/luigi/master

.. image:: https://img.shields.io/pypi/v/luigi.svg?style=flat :target: https://pypi.python.org/pypi/luigi

.. image:: https://img.shields.io/pypi/l/luigi.svg?style=flat :target: https://pypi.python.org/pypi/luigi

Luigi is a Python (3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

Getting Started

Run

pip install luigi

to install the latest stable version from

PyPI<https:></https:>

_.

Documentation for the latest release<https:></https:>
```_ \_ is hosted on readthedocs.

Run

pip install luigi[toml]

 to install Luigi with 

TOML-based configshttps:

\_\_ support.

For the bleeding edge code,

pip install git+https://github.com/spotify/luigi.git

. 

Bleeding edge documentationhttps:

\_\_ is also available.
## Background

The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures _will_ happen. These tasks can be anything, but are typically long running things like

Hadoop http:

\_ jobs, dumping data to/from databases, running machine learning algorithms, or anything else.

There are other software packages that focus on lower level aspects of data processing, like

Hive http:

**,

Pig http:

_, or

Cascading http:

Hive query https:

Hadoop job in Java https:

_, a 

Spark job in Scala or Python https:

dumping a table https:

\_ from a database, or anything else. It's easy to build up long-running pipelines that comprise thousands of tasks and take days or weeks to complete. Luigi takes care of a lot of the workflow management so that you can focus on the tasks themselves and their dependencies.

You can build pretty much any task you want, but Luigi also comes with a_toolbox_ of several common task templates that you use. It includes support for running

Python mapreduce jobs https:

\_ in Hadoop, as well as

Hive https:

**, and 

Pig https:

file system abstractions for HDFS https:

\_, and local files that ensures all file system operations are atomic. This is important because it means your data pipeline will not crash in a state containing partial data.
## Visualiser page

The Luigi server comes with a web interface too, so you can search and filter among all your tasks.

.. figure:: https://raw.githubusercontent.com/spotify/luigi/master/doc/visualiser_front_page.png :alt: Visualiser page

## Dependency graph example

Just to give you an idea of what Luigi does, this is a screen shot from something we are running in production. Using Luigi's visualiser, we get a nice visual overview of the dependency graph of the workflow. Each node represents a task which has to be run. Green tasks are already completed whereas yellow tasks are yet to be run. Most of these tasks are Hadoop jobs, but there are also some things that run locally and build up data files.

.. figure:: https://raw.githubusercontent.com/spotify/luigi/master/doc/user\_recs.png :alt: Dependency graph

## Philosophy

Conceptually, Luigi is similar to

GNU Make http:

\_ where you have certain tasks and these tasks in turn may have dependencies on other tasks. There are also some similarities to 

Oozie http:

\_ and 

Azkaban http:

\_. One major difference is that Luigi is not just built specifically for Hadoop, and it's easy to extend it with other kinds of tasks.

Everything in Luigi is in Python. Instead of XML configuration or similar external data files, the dependency graph is specified _within Python_. This makes it easy to build up complex dependency graphs of tasks, where the dependencies can involve date algebra or recursive references to other versions of the same task. However, the workflow can trigger things not in Python, such as running

Pig scripts https:

\_ or 

scp'ing files https:

\_.
## Who uses Luigi?

We use Luigi internally at

Spotify https:

\_ to run thousands of tasks every day, organized in complex dependency graphs. Most of these tasks are Hadoop jobs. Luigi provides an infrastructure that powers all kinds of stuff including recommendations, toplists, A/B test analysis, external reports, internal dashboards, etc.

Since Luigi is open source and without any registration walls, the exact number of Luigi users is unknown. But based on the number of unique contributors, we expect hundreds of enterprises to use it. Some users have written blog posts or held presentations about Luigi:

- 

Spotify https:

\_ 

(presentation, 2014) http:

\_\_
- 

Foursquare https:

\_ 

(presentation, 2013) http:

\_\_
- 

Mortar Data (Datadog) https:

\_ 

(documentation / tutorial) http:

\_\_
- 

Stripe https:

\_ 

(presentation, 2014) http:

\_\_
- 

Buffer https:

\_ 

(blog, 2014) https:

\_\_
- 

SeatGeek https:

\_ 

(blog, 2015) http:

\_\_
- 

Treasure Data https:

\_ 

(blog, 2015) http:

\_\_
- 

Growth Intelligence http:

\_ 

(presentation, 2015) http:

\_\_
- 

AdRoll https:

\_ 

(blog, 2015) http:

\_\_
- 17zuoye 

(presentation, 2015) https:

\_\_
- 

Custobar https:

\_ 

(presentation, 2016) http:

\_\_
- 

Blendle https:

\_ 

(presentation) http:

\_\_
- 

TrustYou http:

\_ 

(presentation, 2015) https:

\_\_
- 

Groupon https:

\_ / 

OrderUp https:

\_ 

(alternative implementation) https:

\_\_
- 

Red Hat - Marketing Operations https:

\_ 

(blog, 2017) https:

\_\_
- 

GetNinjas https:

\_ 

(blog, 2017) https:

\_\_
- 

voyages-sncf.com https:

\_ 

(presentation, 2017) https:

\_\_
- 

Open Targets https:

\_ 

(blog, 2017) https:

\_\_
- 

Leipzig University Library https:

\_ 

(presentation, 2016) https:

\_\_ / 

(project) https:

\_\_
- 

Synetiq https:

\_ 

(presentation, 2017) https:

\_\_
- 

Glossier https:

\_ 

(blog, 2018) https:

\_\_
- 

Data Revenue https:

\_ 

(blog, 2018) https:

\_
- 

Uppsala University http:

\_ 

(tutorial) http:

\_ / 

(presentation, 2015) https:

\_ / 

(slides, 2015) https:

\_ / 

(poster, 2015) https:

\_ / 

(paper, 2016) https:

\_ / 

(project) https:

\_
- 

GIPHY https:

\_ 

(blog, 2019) https:

\_\_
- 

xtream https:

\_\_ 

(blog, 2019) https:

\_\_
- 

CIAN https:

\_\_ 

(presentation, 2019) https:

\_\_

Some more companies are using Luigi but haven't had a chance yet to write about it:

- 

Schibsted http:

\_
- 

enbrite.ly http:

\_
- 

Dow Jones / The Wall Street Journal http:

\_
- 

Hotels.com https:

\_
- 

Newsela https:

\_
- 

Squarespace https:

\_
- 

OAO https:

\_
- 

Grovo https:

\_
- 

Weebly https:

\_
- 

Deloitte https:

\_
- 

Stacktome https:

\_
- 

LINX+Neemu+Chaordic https:

\_
- 

Foxberry https:

\_
- 

Okko https:

\_
- 

ISVWorld http:

\_
- 

Big Data https:

\_
- 

Movio https:

\_
- 

Bonnier News https:

\_
- 

Starsky Robotics https:

\_
- 

BaseTIS https:

\_
- 

Hopper https:

\_
- 

VOYAGE GROUP/Zucks https:

\_
- 

Textpert https:

\_
- 

Whizar https:

\_
- 

xtream https:

\_\_
- 

Skyscanner https:

\_
- 

Jodel https:

\_
- 

Mekar https:

\_
- 

M3 https:

\_

We're more than happy to have your company added here. Just send a PR on GitHub.

## External links

- 

Mailing List https:

\_ for discussions and asking questions. (Google Groups)
- 

Releases https:

\_ (PyPI)
- 

Source code https:

\_ (GitHub)
- 

Hubot Integration https:

\_ plugin for Slack, Hipchat, etc (GitHub)

## Authors

Luigi was built at

Spotify https:

_, mainly by

Erik Bernhardsson https:

Elias Freider https:

_.

Many other people https:

Arash Rouhani https:

``` _ is currently the chief maintainer of Luigi.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.