You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I checked many optimizations like PGO and PLO (mostly with LLVM BOLT) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help to achieve better performance in many cases like databases. I think trying to optimize this project with them will be an interesting idea to achieve more performance.
I already did some (very basic!) benchmarks and want to share my results here.
Test environment
Fedora 39
Linux kernel 6.8.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.77.2
bbolt-rs version: the latest for now from the main branch on commit 980b96b81768c4c8c78034a99aa7004dc4672674
Disabled Turbo boost
Benchmark
For benchmark purposes, I used these benchmarks. The PGO training workload was bench command run. The release and PGO-optimized results are generated with bench -c 10000000.
All PGO and PLO optimizations are done with cargo-pgo. All tests are done on the same machine, done multiple times, with the same background "noise" (as much as I can guarantee of course) - the results are consistent enough across runs. taskset -c 0 is used for reducing the OS scheduler result interference.
According to the tests above, I see measurable improvements from enabling PGO in performance. However, enabling PLO with LLVM BOLT didn't show measurable improvements at least in the simple test above.
For anyone interested in binary sizes, I collected some statistics too (without debug symbols stripping):
The only interesting case here is the last one - "Release + PGO optimized + BOLT optimized". I don't know why the binary size was increased so much. I guess some "magic" BOLT's option should be involved here and "fix" the situation. However, it's just a guess for now, no more.
Further steps
I can suggest the following action points:
Perform more PGO and PLO benchmarks on the database in more scenarios. If it shows improvements - add a note to the documentation about possible improvements in the project performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize bbolt-rs according to their workloads.
Here are some examples of how PGO optimization is integrated into other projects:
I would be happy to answer your questions about all the optimizations above. Please do not treat the issue as a bug or something like that - it's just an idea of how the project performance can be improved.
The text was updated successfully, but these errors were encountered:
Also, would you mind describing the 2nd step a bit more? I'm not sure what you mean.
Sure! By "Providing an easier way (e.g. a build option) to build scripts with PGO" I mean extending existing bbolt-rs's build infra (scripts) with an additional option - building the database with PGO. It can look like make build_with_pgo (just an example!). With one simple command, it can be easier for users to build their own bbolt-rs version and tweak it accordingly to their workloads. Is it worth it or not - it's up to you. My 0.5$ - it's not worth it on the current project lifecycle stage.
Hi!
Recently I checked many optimizations like PGO and PLO (mostly with LLVM BOLT) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help to achieve better performance in many cases like databases. I think trying to optimize this project with them will be an interesting idea to achieve more performance.
I already did some (very basic!) benchmarks and want to share my results here.
Test environment
main
branch on commit980b96b81768c4c8c78034a99aa7004dc4672674
Benchmark
For benchmark purposes, I used these benchmarks. The PGO training workload was
bench
command run. The release and PGO-optimized results are generated withbench -c 10000000
.All PGO and PLO optimizations are done with cargo-pgo. All tests are done on the same machine, done multiple times, with the same background "noise" (as much as I can guarantee of course) - the results are consistent enough across runs.
taskset -c 0
is used for reducing the OS scheduler result interference.Results
Let's start with the results.
Release:
Release + PGO optimization:
Release + PGO optimization + BOLT optimization:
(just for reference) Release + PGO instrumentation:
(just for reference again) Release + PGO optimized + BOLT instrumented:
According to the tests above, I see measurable improvements from enabling PGO in performance. However, enabling PLO with LLVM BOLT didn't show measurable improvements at least in the simple test above.
For anyone interested in binary sizes, I collected some statistics too (without debug symbols stripping):
The only interesting case here is the last one - "Release + PGO optimized + BOLT optimized". I don't know why the binary size was increased so much. I guess some "magic" BOLT's option should be involved here and "fix" the situation. However, it's just a guess for now, no more.
Further steps
I can suggest the following action points:
bbolt-rs
according to their workloads.Here are some examples of how PGO optimization is integrated into other projects:
configure
scriptI would be happy to answer your questions about all the optimizations above. Please do not treat the issue as a bug or something like that - it's just an idea of how the project performance can be improved.
The text was updated successfully, but these errors were encountered: