Expand description
Blocked bloom filter implementation for efficient database queries.
A blocked bloom filter partitions the filter into fixed-size blocks, where each element hashes to exactly one block for bits to be set, essentially a hash-indexed array of smaller bloom filters. Each block is stored as a separate row. This allows us to query specific blocks for membership instead of scanning the entire filter for improved IO.
-
Targeted queries: When checking membership, only the specific block for that element needs to be loaded from the database, not the entire filter.
-
Incremental updates: Blocks can be OR’d together independently, allowing bloom filters from multiple checkpoints to be merged at the block level without loading entire filters. Note: Do not use folding with blocked bloom filters. Merging requires fixed-size blocks; variable-sized blocks from folding would cause the merge to fail.
-
Sparse storage: Only blocks with bits set need to be stored. In practice, with well-chosen parameters, values should be uniformly distributed across blocks.
The filter uses double hashing: the first hash selects the block, and subsequent hashes (with a different seed to prevent correlation) select bit positions within that block.
Blocked Bloom Filter (128 blocks × 2 KB each)
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ 0 │ 1 │ 2 │ 3 │ ... │ 125 │ 126 │ 127 │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
▲
│
hash(key, seed) selects block, then subsequent hash(key, seed1) sets bits within that blockStructs§
- Blocked
Bloom Filter - A generic blocked bloom filter.
- Blocked
Bloom Probe - Probe for checking membership in a blocked bloom filter.