Need help with superfastmatch?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

121 Stars 10 Forks Other 361 Commits 4 Opened issues


A tool for bulk text comparison and analysis

Services available


Need anything else?

Contributors list


This is a new version of Superfastmatch written in C++ to improve matching performance and with an index running totally in memory to improve response times.

The point of the software is to index large amounts of text in memory. Therefore there isn't much reason to run it on a 32-bit OS with a 4GB cap on memory and a 64-bit OS is assumed

The process for installation is as follows:


Superfastmatch depends on these libraries:

Google gflags

Google perftools

Google ctemplate

Google sparsehash


Kyoto Cabinet

Kyoto Tycoon

You might be able to get away with installing the .deb packages on the listed project pages, but this is untested.

The easier route is to run:


and wait for everything to build. The script will ask you for your sudo password, which is required to install the libraries.

On Ubuntu you'll need to do this first:

sudo apt-get install libunwind7-dev mercurial curl build-essential zlib1g-dev

And you might also need a:

sudo ldconfig

after the script has finished.

On Fedora/Amazon AMI this will to allow to complete:

sudo yum update
sudo yum install git
sudo yum install svn
sudo yum install gcc
sudo yum install gcc-c++
sudo yum install zlib-devel
sudo yum install mercurial
tar xzf libunwind-0.99.tar.gz
cd libunwind-0.99
./configure && make && sudo make install

and you might have to add /usr/local/lib to /etc/


After the libraries are installed, you can run:

make check

to run the unit tests for the code.


After that you can run:

make run

to get a superfastmatch instance running. Nothing is currently configurable from the command line yet. Coming soon...

Visit to test the interface.


For a quick introduction to what can be found with superfastmatch try this:

If you have a machine with less than 8GB of memory and less than 4 cores run:

./superfastmatch -debug -hash_width 24 -reset -slot_count 2 -thread_count 2 -window_size 30

otherwise this will be much faster:

./superfastmatch -debug -reset -window_size 30

And then finally, in another terminal window, run:


to load some example documents and associate them with each other. You can view the results in the browser at:


See contrib/init.d for an example init.d script. Makes use of fuser which may require:

sudo apt-get install psmisc


All feedback welcome. Either create an issue here or ask a question on the mailing list.

Known Issues

This is still an early release halfway between Alpha and Beta! There are known issues with large documents affecting the document list and detail pages and the full REST specification is not yet implemented. Lots of fixes, new features and performance improvements are currently in development so keep checking the commit log!


Thanks to Martin Moore and Ben Campbell at Media Standards Trust for ongoing support for the project and to Tom Lee, Drew Vogel, Kaitlin Lee and James Turk at Sunlight Labs for being willing testers, early adopters and proponents of open source!

Thanks also to Mikio Hirabayashi for assistance and the excellent open source Kyoto Cabinet and Kyoto Tycoon, to Craig Silverstein for accepting and improving this patch, to Neil Fraser for useful hints and inspiration from Diff-Match-Patch and to Austin Appleby for hashing advice.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.