Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedup bloom filter is too slow #22607

Merged
merged 7 commits into from
Jan 22, 2022
Merged

Conversation

aeyakovenko
Copy link
Member

@aeyakovenko aeyakovenko commented Jan 20, 2022

Problem

bloom filter is too slow

Summary of Changes

use Ahash + vector of atomic u64's that OR accumulate the value

test bench_dedup_baseline           ... bench:          21 ns/iter (+/- 0)
test bench_dedup_diff_big_packets   ... bench:     275,466 ns/iter (+/- 1,630)
test bench_dedup_diff_small_packets ... bench:      54,422 ns/iter (+/- 7,897)
test bench_dedup_reset              ... bench:     310,903 ns/iter (+/- 1,091)
test bench_dedup_same_big_packets   ... bench:     257,100 ns/iter (+/- 2,072)
test bench_dedup_same_small_packets ... bench:      48,878 ns/iter (+/- 244)

~62ns per packet

False positive rates as a 1m size filter saturates:

false positive rate: 30/395264
false positive rate: 146/572416
false positive rate: 336/714752
false positive rate: 1064/951296
false positive rate: 1453/1022976

Fixes #

@aeyakovenko aeyakovenko mentioned this pull request Jan 21, 2022
@codecov
Copy link

codecov bot commented Jan 21, 2022

Codecov Report

Merging #22607 (aa5fd86) into master (6edeed8) will increase coverage by 0.4%.
The diff coverage is 82.2%.

@@            Coverage Diff            @@
##           master   #22607     +/-   ##
=========================================
+ Coverage    81.1%    81.5%   +0.4%     
=========================================
  Files         560      555      -5     
  Lines      151206   149740   -1466     
=========================================
- Hits       122633   122080    -553     
+ Misses      28573    27660    -913     

t-nelson
t-nelson previously approved these changes Jan 21, 2022
jstarry
jstarry previously approved these changes Jan 21, 2022
@mergify mergify bot dismissed stale reviews from t-nelson and jstarry January 21, 2022 08:37

Pull request has been modified.

perf/src/sigverify.rs Show resolved Hide resolved
perf/src/sigverify.rs Outdated Show resolved Hide resolved
Comment on lines +452 to +459
let saturated = self.saturated.load(Ordering::Relaxed);
if saturated || now.duration_since(self.age) > self.max_age {
for i in &self.filter {
i.store(0, Ordering::Relaxed);
}
self.seed = thread_rng().gen();
self.age = now;
self.saturated.store(false, Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's ordering and visibility assumptions in these atomics. Specifically around .saturated.store() and .filter[pos].store(). Since these are both relaxed, they can be reordered w.r.t. each other. Since the PR/code mentions this'll be running in parallel, it'd be possible for a threads to see .saturated.store(false) before the filters are cleared.

If that is correct, then I think these orderings should be bumped up, such that the loads become Acquire and the stores become Release. On x86 this is basically free. On arm it becomes correct 😅.

This would also need to apply to all the atomic load/stores of saturated and filter. With self.filter[pos].fetch_or() becoming AcqRel. Since the filter is checked before saturated, it's the same reordering/visibility issue.

Benchmarking locally on my intel x86 MBP, I saw basically the same performance.

Baseline

test bench_dedup_baseline           ... bench:          43 ns/iter (+/- 2)
test bench_dedup_diff_big_packets   ... bench:     416,778 ns/iter (+/- 75,811)
test bench_dedup_diff_small_packets ... bench:     155,807 ns/iter (+/- 19,780)
test bench_dedup_reset              ... bench:     314,559 ns/iter (+/- 33,493)
test bench_dedup_same_big_packets   ... bench:     376,994 ns/iter (+/- 166,838)
test bench_dedup_same_small_packets ... bench:     116,076 ns/iter (+/- 10,169)

Acquire/Release on .saturated

test bench_dedup_baseline           ... bench:          43 ns/iter (+/- 2)
test bench_dedup_diff_big_packets   ... bench:     423,094 ns/iter (+/- 76,294)
test bench_dedup_diff_small_packets ... bench:     154,026 ns/iter (+/- 25,957)
test bench_dedup_reset              ... bench:     313,213 ns/iter (+/- 24,235)
test bench_dedup_same_big_packets   ... bench:     363,302 ns/iter (+/- 164,439)
test bench_dedup_same_small_packets ... bench:     114,966 ns/iter (+/- 24,617)

Commit here: 813e526

Copy link
Member Author

@aeyakovenko aeyakovenko Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordering shouldn't make a difference on how this functions since its tolerant of a few false positives or a few false negatives. bad behavior would be if it gets stuck in a loop constantly resetting the whole filter. but I don't think that would be the case.

perf/src/sigverify.rs Outdated Show resolved Hide resolved
This was referenced Jan 21, 2022
aeyakovenko added a commit that referenced this pull request Jan 21, 2022
Faster dedup port of #22607
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants