Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache wrapper that handles parallel access. #861

Merged
merged 5 commits into from
Nov 6, 2024

Conversation

cody-littley
Copy link
Contributor

Why are these changes needed?

This PR adds a wrapper around lru.Cache. The purpose of this wrapper is to facilitate efficient lookup if two RPCs request the same data from the relay concurrently. Ideally we only want to fetch the data once in order to serve both requests.

Example scenario without cache wrapper:

  • at t=0, client A requests data X. X is not present in the cache, so the relay requests it from S3.
  • at t=50, client B requests data X. X is not present in the cache, so the relay requests it from S3.
  • at t=200, the request initiated by client A is completed. X is now present in the cache.
  • at t=250, the request initiated by client B is completed.

Example scenario with cache wrapper:

  • at t=0, client A requests data X. X is not present in the cache, so the relay requests it from S3.
  • at t=50, client B requests data X. X is not present in the cache, but since the lookup is already in progress, B's request is paused until A's request is completed.
  • at t=200, the request initiated by client A is completed. X is now present in the cache.
  • at t=201, the request initiated by client B is wakes up and uses the same data that was used to serve the request from client A.

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@cody-littley cody-littley self-assigned this Nov 4, 2024
relay/cached_accessor.go Show resolved Hide resolved
relay/cached_accessor.go Show resolved Hide resolved
Signed-off-by: Cody Littley <[email protected]>
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

relay/cached_accessor.go Outdated Show resolved Hide resolved
// is written into the channel when it is eventually fetched. If a key is requested more than once while a
// lookup in progress, the second (and following) requests will wait for the result of the first lookup
// to be written into the channel.
lookupsInProgress *sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need to initialize lookupsInProgress if it's typed as sync.Map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat, fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up having to back out this change in favor of using a mutex. The core issue is that reading the lookupsInProgress map needs to be atomic with respect to reading from the cache. Was able to provoke a race condition in a unit test that caused an unnecessary cache miss.

// Wait for the goroutines to start. We want to give the goroutines a chance to do naughty things if they want.
// Eliminating this sleep will not cause the test to fail, but it may cause the test not to exercise the
// desired race condition.
time.Sleep(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleep is generally not reliable in tests. i.e. I don't think it's guaranteed that it always sleep the right amount in test and it can result in flaky tests.
Maybe we can use something like a buffered channel to wait for all goroutines to trigger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, I'm not relying on the sleep to make the test pass. The purpose of the sleep statement is to allow background threads a chance to do bad things if they want to do bad things.

In order to show that test stability does not depend on the sleep, I've set this up so that it runs twice: once with the sleep and once without it. Are you ok with this approach? If you'd still prefer to avoid having a sleep, we should chat. One option might just be to remove the sleep entirely.

relay/cached_accessor.go Show resolved Hide resolved
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
@cody-littley cody-littley merged commit ae8ccaa into Layr-Labs:master Nov 6, 2024
6 checks passed
@cody-littley cody-littley deleted the cached-accessor branch November 6, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants