Skip to content

A four-dimensional Analysis of Partitioned Approximate Filters

License

Notifications You must be signed in to change notification settings

tum-db/partitioned-filters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Partitioned Filters

The repository contains our implementations of four approximate filters: the Bloom filter [1], the Cuckoo filter [2], the Morton filter [3], and the Xor filter [4]. We used the code in our paper A four-dimensional Analysis of Partitioned Filters.

In addition to our optimized filter implementations, the repository also contains the code of state-of-the-art competitors we compare to and extensive test cases. We generate the benchmarks using python scripts and included our results on an Intel i9-9900x (Skylake-X) with 64 GiB memory.

Using the Code

Prerequisites

  • C++20 (we used GCC 10.2)
  • CMake version 3.10 & make
  • Linux (we used Ubuntu 20.10 with kernel version 5.8)
  • Python3 + virtualenv (python is only needed for generating the benchmarks and the plots)

Tests

make test executes all available test (~400 test cases that test the correctness of our implementation (no false-negatives) and different aspects like multi-threading, partitioning, vectorization, etc)

Benchmarks

make benchmark generates the benchmark and the user can select the benchmark to execute. The results are written to csv-files in the benchmark/paper folder. All measurements are repeated five times to get stable results.

Executing all benchmarks takes roughly 1 week and requires 64 GiB memory. Some of the benchmarks do measure only the false-positive rate and the failed builds and, thus, should be executed with all available threads.

The csv includes the following fields:

Field Unit Description
name Text Configuration: <BenchmarkName>_<k>/ <Fixture>/ <s> / <n_threads> / <n_partitions> / <elements_build> / <elements_lookup> / <shared_elements> / _ / _
real_time milliseconds execution time per iteration
DTLB-misses float data translation lookaside buffer misses per iteration
ITLB-misses float instruction translation lookaside buffer misses per iteration
L1D-misses float level 1 data cache misses per iteration
L1I-misses float level 1 instruction cache misses per iteration
LLC-misses float last-level (L3) cache misses per iteration
branch-misses float branch misses per iteration
cycles float cycles per iteration
instruction float executed instructions per iteration
FPR float false-positive rate of the filter (only available when lookup is benchmarked)
failures integer number of failed builds
retries integer number of retries needed to build the filter
bits float number of bits per key allocated to the filter
size integer size of the filter in bytes

Plots

make plot generates all plots used in the paper. The benchmark data is taken from benchmark/paper.

Repository Structure

  • benchmark: code for benchmarking the filter and the definition files with our results on the Skylake-X machine.
  • cmake: optional packages.
  • lib: external dependencies and existing filter implementations. The code in this folder is not licensed under the MIT License (see Dependencies).
  • python: scripts for generating, executing and plotting benchmarks.
  • src: filter implementations.
  • test: extensive test cases for our filter implementations and the integration the competitors.
  • vendor: external packages.

Dependencies

  • lib/amd_mortonfilter: original Morton Filter implementation used in [3], licensed under the MIT License.
  • lib/bsd: (register-)blocked and (cache-)sectorized Bloom Filter implementations with SIMD support and external competitors used in [1], licensed under the Apache License (Version 2.0), the 2-clause BSD License, and the 3-clause BSD License.
  • lib/cityhash: Google's CityHash implementation, licensed under the MIT License.
  • lib/efficient_cuckoofilter: original Cuckoo Filter implementation used in [2], licensed under the Apache License (Version 2.0).
  • lib/fastfilter: original Xor Filter implementation used in [4] licensed under the Apache License (Version 2.0).
  • lib/impala: original sectorized Bloom Filter used in the Impala, licensed under the Apache License (Version 2.0).
  • lib/libdivide: the LibDivide library computes magic numbers for optimizing integer divisions, licensed under the zlib License.
  • lib/perfevent: library for reading perf counters in C(++), licensed under the MIT License.
  • lib/vacuumfilter: Vacuum Filter implementation.

Related Work

[1] Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput

[2] Cuckoo Filter: Practically Better Than Bloom

[3] Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity

[4] Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

About

A four-dimensional Analysis of Partitioned Approximate Filters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published