Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: Relax locking contention for cache and cachekv #353

Merged
merged 5 commits into from
Dec 5, 2023
Merged

Conversation

yzang2019
Copy link
Collaborator

@yzang2019 yzang2019 commented Nov 17, 2023

Describe your changes and provide context

Problem:
Currently when doing profiling, there are lot of locking contention happening in the cachekv layer, this is because we are using mutex for all read and write keys, but cachekv as a transient cache doesn't really need such a strict locking mechanism. Having a high locking contention would hurt the parallelize transaction execution performance a lot.

Solution:

  • Replace BoundedCache with sync.map to have a per key based locking. We don't really need to bound the cache size for transient cachekv store since the cache will be destroyed after the block is finalized.
  • Do not read through the cache. When call get, previously we will also write the value to the cache which requires us to add a lock around the whole read+write back operation, however, as a transient cache, this wouldn't actually help benefit with cache hit too much, removing the read through behavior would help reduce contention a lot
  • Relax and narrow the locking scope for commitkvcache, this will still be used as an inter block cache

Testing performed to validate your change

Fully tested in loadtest env

Copy link

codecov bot commented Nov 21, 2023

Codecov Report

Merging #353 (fe8744a) into main (5f04ca7) will decrease coverage by 0.04%.
Report is 1 commits behind head on main.
The diff coverage is 91.80%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #353      +/-   ##
==========================================
- Coverage   55.41%   55.38%   -0.04%     
==========================================
  Files         620      620              
  Lines       51694    51605      -89     
==========================================
- Hits        28646    28581      -65     
+ Misses      20964    20941      -23     
+ Partials     2084     2083       -1     
Files Coverage Δ
store/cache/cache.go 78.57% <100.00%> (+0.31%) ⬆️
store/cachekv/store.go 81.57% <100.00%> (+9.28%) ⬆️
x/auth/ante/sigverify.go 63.35% <100.00%> (-0.23%) ⬇️
x/auth/types/params.go 76.04% <ø> (ø)
x/auth/ante/batch_sigverify.go 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

Copy link
Contributor

@stevenlanders stevenlanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good - it might make sense to do a gobench test before/after the change to help double-check the throughput increases (hard to eyeball, this could be much faster, not sure)

} else {
value = cacheValue.Value()
return store.parent.Get(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, we can drop the else block here and just return

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, will be fix


// We need a copy of all of the keys.
// Not the best, but probably not a bottleneck depending.
keys := make([]string, 0, store.cache.Len())
keys := []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the allocations, keeping the size at 0 and capacity at store.cache.Len() can actually be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but unfortunately, sync map doesnt support length, that's why we remove all length here

}
}

// Clear the cache using the map clearing idiom
// and not allocating fresh objects.
// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
store.cache.DeleteAll()
store.cache.Range(func(key, value any) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side node: I wonder if it would make sense to just set the cache to a new map (requires concurrency protection) and let the garbage collector clean up the old one. This is logically fine.

Copy link
Collaborator Author

@yzang2019 yzang2019 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is good questions, I've thought about that, but I think it could be risky, I'm not sure why initially we are not doing that tbh, so better to just keep the same logic first.

@yzang2019 yzang2019 merged commit 628c7e4 into main Dec 5, 2023
15 checks passed
@yzang2019 yzang2019 deleted the yzang/SEI-6061 branch December 5, 2023 16:12
yzang2019 added a commit that referenced this pull request Jan 17, 2024
## Describe your changes and provide context
This PR reverts the change in
#353 and
#391 until we have OCC
fully enabled.

## Testing performed to validate your change
Unit test coverage
codchen pushed a commit that referenced this pull request Feb 6, 2024
## Describe your changes and provide context
This PR reverts the change in
#353 and
#391 until we have OCC
fully enabled.

## Testing performed to validate your change
Unit test coverage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants