Need help with bloomfilter-rb?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.
igrigorik

Description

BloomFilter(s) in Ruby: Native counting filter + Redis counting/non-counting filters

444 Stars 61 Forks 114 Commits 9 Opened issues

Services available

Need anything else?

BloomFilter(s) in Ruby

  • Native (MRI/C) counting bloom filter
  • Redis-backed getbit/setbit non-counting bloom filter
  • Redis-backed set-based counting (+TTL) bloom filter

Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail, check the wikipedia article. Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, ..., k-1). This may or may not give you a good distribution, it all depends on the data.

Performance of the Bloom filter depends on a number of variables:

  • size of the bit array
  • size of the counter bucket
  • number of hash functions

Resources


MRI/C API Example

MRI/C implementation which creates an in-memory filter which can be saved and reloaded from disk.

require 'bloomfilter-rb'

bf = BloomFilter::Native.new(:size => 100, :hashes => 2, :seed => 1, :bucket => 3, :raise => false) bf.insert("test") bf.include?("test") # => true bf.include?("blah") # => false

bf.delete("test") bf.include?("test") # => false

Hash with a bloom filter!

bf["test2"] = "bar" bf["test2"] # => true bf["test3"] # => false

bf.stats

=> Number of filter bits (m): 10

=> Number of filter elements (n): 2

=> Number of filter hashes (k) : 2

=> Predicted false positive rate = 10.87%


Redis-backed setbit/getbit bloom filter

Uses getbit/setbit on Redis strings - efficient, fast, can be shared by multiple/concurrent processes.

bf = BloomFilter::Redis.new

bf.insert('test') bf.include?('test') # => true bf.include?('blah') # => false

bf.delete('test') bf.include?('test') # => false

Memory footprint

  • 1.0% error rate for 1M items, 10 bits/item: 2.5 mb
  • 1.0% error rate for 150M items, 10 bits per item: 358.52 mb
  • 0.1% error rate for 150M items, 15 bits per item: 537.33 mb

Redis-backed counting bloom filter with TTLs

Uses regular Redis get/set counters to implement a counting filter with optional TTL expiry. Because each "bit" requires its own key in Redis, you do incur a much larger memory overhead.

bf = BloomFilter::CountingRedis.new(:ttl => 2)

bf.insert('test') bf.include?('test') # => true

sleep(2) bf.include?('test') # => false

Credits

Tatsuya Mori [email protected] (Original C implementation: http://vald.x0.com/sb/)

License

MIT License - Copyright (c) 2011 Ilya Grigorik

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.