-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedup bloom filter is too slow #22607
Conversation
2fd2467
to
1cc99c0
Compare
Codecov Report
@@ Coverage Diff @@
## master #22607 +/- ##
=========================================
+ Coverage 81.1% 81.5% +0.4%
=========================================
Files 560 555 -5
Lines 151206 149740 -1466
=========================================
- Hits 122633 122080 -553
+ Misses 28573 27660 -913 |
78ffbe2
to
050e536
Compare
Pull request has been modified.
let saturated = self.saturated.load(Ordering::Relaxed); | ||
if saturated || now.duration_since(self.age) > self.max_age { | ||
for i in &self.filter { | ||
i.store(0, Ordering::Relaxed); | ||
} | ||
self.seed = thread_rng().gen(); | ||
self.age = now; | ||
self.saturated.store(false, Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's ordering and visibility assumptions in these atomics. Specifically around .saturated.store()
and .filter[pos].store()
. Since these are both relaxed, they can be reordered w.r.t. each other. Since the PR/code mentions this'll be running in parallel, it'd be possible for a threads to see .saturated.store(false)
before the filters are cleared.
If that is correct, then I think these orderings should be bumped up, such that the loads become Acquire and the stores become Release. On x86 this is basically free. On arm it becomes correct 😅.
This would also need to apply to all the atomic load/stores of saturated
and filter
. With self.filter[pos].fetch_or()
becoming AcqRel
. Since the filter is checked before saturated
, it's the same reordering/visibility issue.
Benchmarking locally on my intel x86 MBP, I saw basically the same performance.
Baseline
test bench_dedup_baseline ... bench: 43 ns/iter (+/- 2)
test bench_dedup_diff_big_packets ... bench: 416,778 ns/iter (+/- 75,811)
test bench_dedup_diff_small_packets ... bench: 155,807 ns/iter (+/- 19,780)
test bench_dedup_reset ... bench: 314,559 ns/iter (+/- 33,493)
test bench_dedup_same_big_packets ... bench: 376,994 ns/iter (+/- 166,838)
test bench_dedup_same_small_packets ... bench: 116,076 ns/iter (+/- 10,169)
Acquire/Release on .saturated
test bench_dedup_baseline ... bench: 43 ns/iter (+/- 2)
test bench_dedup_diff_big_packets ... bench: 423,094 ns/iter (+/- 76,294)
test bench_dedup_diff_small_packets ... bench: 154,026 ns/iter (+/- 25,957)
test bench_dedup_reset ... bench: 313,213 ns/iter (+/- 24,235)
test bench_dedup_same_big_packets ... bench: 363,302 ns/iter (+/- 164,439)
test bench_dedup_same_small_packets ... bench: 114,966 ns/iter (+/- 24,617)
Commit here: 813e526
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ordering shouldn't make a difference on how this functions since its tolerant of a few false positives or a few false negatives. bad behavior would be if it gets stuck in a loop constantly resetting the whole filter. but I don't think that would be the case.
Faster dedup port of #22607
Problem
bloom filter is too slow
Summary of Changes
use Ahash + vector of atomic u64's that OR accumulate the value
~62ns per packet
False positive rates as a 1m size filter saturates:
Fixes #