Github url

fuzzywuzzy

by seatgeek

seatgeek /fuzzywuzzy

Fuzzy String Matching in Python

7.1K Stars 762 Forks Last release: 5 months ago (0.18.0) GNU General Public License v2.0 379 Commits 24 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

.. image:: https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master :target: https://travis-ci.org/seatgeek/fuzzywuzzy

FuzzyWuzzy

Fuzzy string matching like a boss. It uses

Levenshtein Distance <https:></https:>

_ to calculate the differences between sequences in a simple-to-use package.

Requirements

  • Python 2.7 or higher
  • difflib
  • python-Levenshtein <https:></https:>
    _ (optional, provides a 4-10x speedup in String Matching, though may result in
    differing results for certain cases <https:></https:>
    _)

For testing ~ - pycodestyle - hypothesis - pytest

Installation

Using PIP via PyPI

.. code:: bash

pip install fuzzywuzzy

or the following to install

python-Levenshtein

too

.. code:: bash

pip install fuzzywuzzy[speedup]

Using PIP via Github

.. code:: bash

pip install git+git://github.com/seatgeek/[email protected]#egg=fuzzywuzzy

Adding to your

requirements.txt

file (run

pip install -r requirements.txt

afterwards)

.. code:: bash

git+ssh://[email protected]/seatgeek/[email protected]#egg=fuzzywuzzy

Manually via GIT

.. code:: bash

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy cd fuzzywuzzy python setup.py install

Usage

.. code:: python

\>\>\> from fuzzywuzzy import fuzz \>\>\> from fuzzywuzzy import process

Simple Ratio ~

.. code:: python

\>\>\> fuzz.ratio("this is a test", "this is a test!") 97

Partial Ratio ~

.. code:: python

\>\>\> fuzz.partial\_ratio("this is a test", "this is a test!") 100

Token Sort Ratio ~

.. code:: python

\>\>\> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 91 \>\>\> fuzz.token\_sort\_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 100

Token Set Ratio ~

.. code:: python

\>\>\> fuzz.token\_sort\_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 84 \>\>\> fuzz.token\_set\_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 100

Process ~

.. code:: python

\>\>\> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"] \>\>\> process.extract("new york jets", choices, limit=2) [('New York Jets', 100), ('New York Giants', 78)] \>\>\> process.extractOne("cowboys", choices) ("Dallas Cowboys", 90)

You can also pass additional parameters to

extractOne

method to make it use a specific scorer. A typical use case is to match file paths:

.. code:: python

\>\>\> process.extractOne("System of a down - Hypnotize - Heroin", songs) ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86) \>\>\> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token\_sort\_ratio) ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

.. |Build Status| image:: https://api.travis-ci.org/seatgeek/fuzzywuzzy.png?branch=master :target: https:travis-ci.org/seatgeek/fuzzywuzzy

Known Ports

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

  • Java:
    xpresso's fuzzywuzzy implementation <https:></https:>
    _
  • Java:
    fuzzywuzzy (java port) <https:></https:>
    _
  • Rust:
    fuzzyrusty (Rust port) <https:></https:>
    _
  • JavaScript:
    fuzzball.js (JavaScript port) <https:></https:>
    _
  • C++:
    Tmplt/fuzzywuzzy <https:></https:>
    _
  • C#:
    fuzzysharp (.Net port) <https:></https:>
    _
  • Go:
    go-fuzzywuzz (Go port) <https:></https:>
    _
  • Free Pascal:
    FuzzyWuzzy.pas (Free Pascal port) <https:></https:>
    _
  • Kotlin multiplatform:
    FuzzyWuzzy-Kotlin <https:></https:>
    _
  • R:
    fuzzywuzzyR (R port) <https:></https:>
    _

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.