Need help with bloomfilter-rb?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

igrigorik
449 Stars 61 Forks 114 Commits 9 Opened issues

Description

BloomFilter(s) in Ruby: Native counting filter + Redis counting/non-counting filters

Services available

!
?

Need anything else?

Contributors list

# 21,531
Ruby
Chrome
html5
rubyml
49 commits
# 65,785
Ruby
twilio
Shell
padrino
9 commits
# 334,655
ml
svm-cla...
Shell
testng
6 commits
# 28,113
Ruby
Shell
mruby
Rails
4 commits
# 588,346
C
Ruby
2 commits
# 485
Ruby
Rails
activer...
pypi
2 commits
# 117,747
Ruby
Shell
Perl
capistr...
2 commits
# 185,862
Ruby
Shell
C
2 commits
# 313,258
mri
Shell
Nim
Haxe
1 commit
# 119,098
Ruby
Redis
Rails
postgre...
1 commit
# 115,011
Elixir
C
identif...
Rails
1 commit
# 216,261
C
Shell
Scala
Jupyter...
1 commit

BloomFilter(s) in Ruby

  • Native (MRI/C) counting bloom filter
  • Redis-backed getbit/setbit non-counting bloom filter
  • Redis-backed set-based counting (+TTL) bloom filter

Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail, check the wikipedia article. Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, ..., k-1). This may or may not give you a good distribution, it all depends on the data.

Performance of the Bloom filter depends on a number of variables:

  • size of the bit array
  • size of the counter bucket
  • number of hash functions

Resources


MRI/C API Example

MRI/C implementation which creates an in-memory filter which can be saved and reloaded from disk.

require 'bloomfilter-rb'

bf = BloomFilter::Native.new(:size => 100, :hashes => 2, :seed => 1, :bucket => 3, :raise => false) bf.insert("test") bf.include?("test") # => true bf.include?("blah") # => false

bf.delete("test") bf.include?("test") # => false

Hash with a bloom filter!

bf["test2"] = "bar" bf["test2"] # => true bf["test3"] # => false

bf.stats

=> Number of filter bits (m): 10

=> Number of filter elements (n): 2

=> Number of filter hashes (k) : 2

=> Predicted false positive rate = 10.87%


Redis-backed setbit/getbit bloom filter

Uses getbit/setbit on Redis strings - efficient, fast, can be shared by multiple/concurrent processes.

bf = BloomFilter::Redis.new

bf.insert('test') bf.include?('test') # => true bf.include?('blah') # => false

bf.delete('test') bf.include?('test') # => false

Memory footprint

  • 1.0% error rate for 1M items, 10 bits/item: 2.5 mb
  • 1.0% error rate for 150M items, 10 bits per item: 358.52 mb
  • 0.1% error rate for 150M items, 15 bits per item: 537.33 mb

Redis-backed counting bloom filter with TTLs

Uses regular Redis get/set counters to implement a counting filter with optional TTL expiry. Because each "bit" requires its own key in Redis, you do incur a much larger memory overhead.

bf = BloomFilter::CountingRedis.new(:ttl => 2)

bf.insert('test') bf.include?('test') # => true

sleep(2) bf.include?('test') # => false

Credits

Tatsuya Mori [email protected] (Original C implementation: http://vald.x0.com/sb/)

License

MIT License - Copyright (c) 2011 Ilya Grigorik

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.