forked from cometbft/cometbft
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(blockstore): Add LRU caches to blockstore operations used in con… #76
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…sensus (backport cometbft#3003) (cometbft#3083) Closes cometbft#2844 We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache. The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time. With the new benchmark I added: OLD: ``` BenchmarkRepeatedLoadSeenCommit-12 24447 54691 ns/op 46495 B/op 319 allocs/op ``` NEW: ``` BenchmarkRepeatedLoadSeenCommit-12 224131 6401 ns/op 8320 B/op 2 allocs/op ``` It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want. 1 hour cpu profile that shows this appearing in prod: ![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d) The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs) --- - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments - [ ] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3003 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <[email protected]> Co-authored-by: Anton Kaliaev <[email protected]>
7 tasks
PaddyMc
added a commit
that referenced
this pull request
May 28, 2024
…… (backport #76) (#81) * perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport cometbft#3003) (cometbft#3083) Closes cometbft#2844 We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache. The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time. With the new benchmark I added: OLD: ``` BenchmarkRepeatedLoadSeenCommit-12 24447 54691 ns/op 46495 B/op 319 allocs/op ``` NEW: ``` BenchmarkRepeatedLoadSeenCommit-12 224131 6401 ns/op 8320 B/op 2 allocs/op ``` It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want. 1 hour cpu profile that shows this appearing in prod: ![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d) The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs) --- - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments - [ ] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3003 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <[email protected]> Co-authored-by: Anton Kaliaev <[email protected]> (cherry picked from commit 0c10bd5) * Add Changelog (cherry picked from commit 4594f29) # Conflicts: # CHANGELOG.md --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Dev Ojha <[email protected]> Co-authored-by: PaddyMc <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…sensus (backport cometbft#3003) (cometbft#3083)
Closes cometbft#2844
We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache.
The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time.
With the new benchmark I added:
OLD:
NEW:
It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want.
1 hour cpu profile that shows this appearing in prod:
The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs)
.changelog
(we use unclog to manage our changelog)docs/
orspec/
) and code commentsThis is an automatic backport of pull request perf(blockstore): Add LRU caches to blockstore operations used in consensus cometbft/cometbft#3003 done by Mergify.
PR checklist
.changelog
(we use unclog to manage our changelog)docs/
orspec/
) and code comments