Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-4323: fix for per shard memory based eviction #7174

Merged
merged 6 commits into from
Oct 31, 2024
Merged

CBG-4323: fix for per shard memory based eviction #7174

merged 6 commits into from
Oct 31, 2024

Conversation

gregns1
Copy link
Contributor

@gregns1 gregns1 commented Oct 30, 2024

CBG-4323

  • Hit a panic in 3.2.1 testing that was due to using the overall memory stat for memory based eviction when using sharded rev cache, this panic was from trying to evict from empty rev cache shard. As I was using the overall stat for rev cache memory usage when I hit a point where memory based eviction was needed (as overall memory exceeded the parameter), per chance I hit this point when trying to add a new doc that was going to empty shard. When adding this doc eviction as hit and then we tried to evict from the current shard but to was empty.
  • Fix is to keep separate count for memory usage per shard and evict per shard aligning with what we do for num items eviction
  • Had to do this through another atomic as we don't always hold rev cache lock when incrementing/decrementing memory and reacquiring lock will probably be more expensive
  • Have minimum of 50MB for cache config
  • Removed 10% buffer per shard

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

Copy link

github-actions bot commented Oct 30, 2024

Redocly previews

Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good with some naming/comment suggestions for clarity and a suggestion for a future enhancement ticket.

cacheNumItems *base.SgwIntStat
lock sync.Mutex
capacity uint32 // Max number of items capacity of LRURevisionCache
MaxMemoryCapacity int64 // Max memory capacity of LRURevisionCache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this needs to be a public property (can be maxMemoryCapacity). Actually - I'm not clear why this needed to change from memoryCapacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making public was mistake, the renaming was just to make it more clear in terms of distinction from currMemoryCapacity but after the rename of that to currMemoryUsage I can revert it.

db/revision_cache_lru.go Outdated Show resolved Hide resolved
db/revision_cache_lru.go Outdated Show resolved Hide resolved
// LRURevisionCache object for sharding the rev cache. This way we can perform eviction on per shard basis much like
// we do with the number of items capacity eviction
rc.currMemoryCapacity.Add(bytesCount)
rc.cacheMemoryBytesStat.Add(bytesCount)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other scenarios like this (for the channel cache in particular), we compute aggregate stats at stat collection time (see UpdateCalculatedStats). The above approach may be fine for 3.2.1 (based on performance results), but filing a tracking ticket to use this approach going forward would be a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the approach I wanted to take, but it came with its complexities with fetching the current memory usage. Would require a new interface method on the rev cache which I thought at this time may not be wanted (given the time constraints). I can file a ticket to explore this in a future release.

@gregns1 gregns1 assigned adamcfraser and unassigned gregns1 Oct 30, 2024
@bbrks bbrks enabled auto-merge (squash) October 31, 2024 10:55
@bbrks bbrks merged commit bf0276c into main Oct 31, 2024
41 checks passed
@bbrks bbrks deleted the CBG-4323 branch October 31, 2024 10:56
bbrks pushed a commit that referenced this pull request Oct 31, 2024
* CBG-4323: fix for per shard memory based eviction

* add min size, some test assertions and remove 10% buffer on shards

* fix test for CE

* update docs

* fix failing test on default collection

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants