Module blocked

Module blocked 

Source
Expand description

Blocked bloom filter implementation for efficient database queries.

A blocked bloom filter partitions the filter into fixed-size blocks, where each element hashes to exactly one block for bits to be set, essentially a hash-indexed array of smaller bloom filters. Each block is stored as a separate row. This allows us to query specific blocks for membership instead of scanning the entire filter for improved IO.

  1. Targeted queries: When checking membership, only the specific block for that element needs to be loaded from the database, not the entire filter.

  2. Incremental updates: Blocks can be OR’d together independently, allowing bloom filters from multiple checkpoints to be merged at the block level without loading entire filters. Note: Do not use folding with blocked bloom filters. Merging requires fixed-size blocks; variable-sized blocks from folding would cause the merge to fail.

  3. Sparse storage: Only blocks with bits set need to be stored. In practice, with well-chosen parameters, values should be uniformly distributed across blocks.

The filter uses double hashing: the first hash selects the block, and subsequent hashes (with a different seed to prevent correlation) select bit positions within that block.

  Blocked Bloom Filter (128 blocks × 2 KB each)
  ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
  │  0  │  1  │  2  │  3  │ ... │ 125 │ 126 │ 127 │
  └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
           ▲
           │
      hash(key, seed) selects block, then subsequent hash(key, seed1) sets bits within that block

Structs§

BlockedBloomFilter
A generic blocked bloom filter.
BlockedBloomProbe
Probe for checking membership in a blocked bloom filter.