Need help with Efficient-Apriori?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

182 Stars 34 Forks MIT License 94 Commits 0 Opened issues


An efficient Python implementation of the Apriori algorithm.

Services available


Need anything else?

Contributors list

# 92,498
91 commits
# 549,332
1 commit
# 624,469
1 commit

Efficient-Apriori Build Status PyPI version Documentation Status Downloads Black

An efficient pure Python implementation of the Apriori algorithm. Works with Python 3.6+.

The apriori algorithm uncovers hidden structures in categorical data. The classical example is a database containing purchases from a supermarket. Every purchase has a number of items associated with it. We would like to uncover association rules such as

{bread, eggs} -> {bacon}
from the data. This is the goal of association rule learning, and the Apriori algorithm is arguably the most famous algorithm for this problem. This repository contains an efficient, well-tested implementation of the apriori algorithm as described in the original paper by Agrawal et al, published in 1994.

The code is stable and in widespread use. It's cited in the book "Mastering Machine Learning Algorithms" by Bonaccorso.


Here's a minimal working example. Notice that in every transaction with

is present too. Therefore, the rule
{eggs} -> {bacon}
is returned with 100 % confidence.
from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.5, min_confidence=1)
print(rules)  # [{eggs} -> {bacon}, {soup} -> {bacon}]

If your data is in a pandas DataFrame, you must convert it to a list of tuples. Do you have missing values, or does the algorithm run for a long time? See this comment. More examples are included below.


The software is available through GitHub, and through PyPI. You may install the software using

pip install efficient-apriori


You are very welcome to scrutinize the code and make pull requests if you have suggestions and improvements. Your submitted code must be PEP8 compliant, and all tests must pass. Contributors: CRJFisher

More examples

Filtering and sorting association rules

It's possible to filter and sort the returned list of association rules.

from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.2, min_confidence=1)

Print out every rule with 2 items on the left hand side,

1 item on the right hand side, sorted by lift

rules_rhs = filter(lambda rule: len(rule.lhs) == 2 and len(rule.rhs) == 1, rules) for rule in sorted(rules_rhs, key=lambda rule: rule.lift): print(rule) # Prints the rule and its confidence, support, lift, ...

Working with large datasets

If you have data that is too large to fit in memory, you may pass a function returning a generator instead of a list. The

will most likely have to be a large value, or the algorithm will take very long before it terminates. If you have massive amounts of data, this Python implementation is likely not fast enough, and you should consult more specialized implementations.
def data_generator(filename):
  Data generator, needs to return a generator to be called several times.
  Use this approach if data is too large to fit in memory. If not use a list.
  def data_gen():
    with open(filename) as file:
      for line in file:
        yield tuple(k.strip() for k in line.split(','))      

return data_gen

transactions = data_generator('dataset.csv') itemsets, rules = apriori(transactions, min_support=0.9, min_confidence=0.6)

Transactions with IDs

If you need to know which transactions occurred in the frequent itemsets, set the

parameter to
. This changes the output to contain
objects for each itemset. The objects have a
property containing is the set of ids of frequent transactions as well as a
property. The ids are the enumeration of the transactions in the order they appear.
from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, output_transaction_ids=True)
# {1: {('bacon',): ItemsetCount(itemset_count=3, members={0, 1, 2}), ...

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.