Serialization benchmarks. #808

arsh · 2024-03-10T10:37:09Z

Description of change

Using Criterion benchmarks, I want to show how we see a performance gain (~33% faster) by switching from bincode to bincode2 when reading the cached blocks from disk.
I want to get feedback on changes made to DiskDataCache to abstract out the serialization library. For example, am I doing the right abstraction? Am I using the trait boundaries in the right way? So, mostly on the Rust-side of things.

Benchmarks results

In summary, using bincode2 gets a better performance than bincode and closer to the base line of reading the contents of a file to memory (read_file in the graphs). From flamegraphs, we noticed that bincode was zeroing the vectors in its fill_buffer implementation. bincode2 has a different implementation which is more efficient.

Reading a file (base line)

Reading cache through bincode

Reading cache through bincode2

Relevant issues: N/A

Does this change impact existing behavior?

No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Andres Santana <[email protected]>

passaro

This is great! I'll add a few specific comments, but first:

I want to get feedback on changes made to DiskDataCache to abstract out the serialization library

Do we need to abstract the serialization library? Any reason not to just replace it?

arsh · 2024-03-11T09:19:38Z

This is great! I'll add a few specific comments, but first:

I want to get feedback on changes made to DiskDataCache to abstract out the serialization library

Do we need to abstract the serialization library? Any reason not to just replace it?

I wanted to test them at the same time and needed ability to configure it and thus this abstraction. We don't necessarily need it. However, we could be in the same situation in the future where we want to test with a new library and it would be nice to have this in place to just plug the new one in.

jamesbornholt · 2024-03-11T18:30:37Z

This is cool, but bincode2 hasn't been updated for 4 years and looks unmaintained, so not sure we should be switching to it.

jamesbornholt · 2024-03-11T18:32:25Z

mountpoint-s3/Cargo.toml

 bincode = "1.3.3"
+bincode2 = "2.0.1"


I don't like that we have to take a non-dev-dependency on "all the versions we want to benchmark" here. I'd rather put it behind a feature flag, or just not commit the versions we're not using.

arsh · 2024-03-13T13:59:56Z

This is cool, but bincode2 hasn't been updated for 4 years and looks unmaintained, so not sure we should be switching to it.

bincode 1.x is in a similar boat. The version we use, 1.3.3, was released 3 years ago. After that, bincode switched to 2.x. These 2.x versions are marked with rc which I presume means release candidate and not production ready? Since 2.x claims to be a re-write, I did the same benchmarks using it and did not see the improvements we got from bincode2.

One way forward is to switch to bincode = 2.x, do perf analysis to determine why its performance isn't as good as bincode2, and potentially submit a PR to fix it.

More importantly, I was hoping this micro-optimization would help with the overall cache performance as described in #719 but when I tested it following what is described there, I did not see an improvement, which looks like something else (to be determined) is the bottleneck.

Serialization benchmarks.

38ce78d

Signed-off-by: Andres Santana <[email protected]>

arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive

arsh requested review from jamesbornholt and passaro March 10, 2024 10:37

Fixing clippy and format issues.

710bc03

Signed-off-by: Andres Santana <[email protected]>

arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive

passaro reviewed Mar 11, 2024

View reviewed changes

jamesbornholt reviewed Mar 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization benchmarks. #808

Serialization benchmarks. #808

arsh commented Mar 10, 2024 •

edited

Loading

passaro left a comment

arsh commented Mar 11, 2024 •

edited

Loading

jamesbornholt commented Mar 11, 2024

jamesbornholt Mar 11, 2024

arsh commented Mar 13, 2024 •

edited

Loading

Serialization benchmarks. #808

Are you sure you want to change the base?

Serialization benchmarks. #808

Conversation

arsh commented Mar 10, 2024 • edited Loading

Description of change

Benchmarks results

Reading a file (base line)

Reading cache through bincode

Reading cache through bincode2

Does this change impact existing behavior?

passaro left a comment

Choose a reason for hiding this comment

arsh commented Mar 11, 2024 • edited Loading

jamesbornholt commented Mar 11, 2024

jamesbornholt Mar 11, 2024

Choose a reason for hiding this comment

arsh commented Mar 13, 2024 • edited Loading

arsh commented Mar 10, 2024 •

edited

Loading

arsh commented Mar 11, 2024 •

edited

Loading

arsh commented Mar 13, 2024 •

edited

Loading