-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: refresh range cache on rangefeed barrier failure #119512
kvserver: refresh range cache on rangefeed barrier failure #119512
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Get
on rangefeed barrier failure
0c7779e
to
8eb5fc2
Compare
Get
on rangefeed barrier failure
As far as I can tell, this should do the right thing. I'm going to add a regression test before marking as ready, but feel free to have an early look @nvanbenschoten. |
Kicked off another 10 runs of |
8eb5fc2
to
8caed65
Compare
8caed65
to
857229a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression test in place, should be good to go.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_rangefeed.go
line 213 at r2 (raw file):
// // TODO(erikgrinaker): the DistSender should refresh its cache instead. if _, err := tp.r.store.db.Get(ctx, tp.span.Key); err != nil {
Are we cool with using a Get here? Should we consider something else like a LeaseInfo?
Only reason to do so would be to avoid skewing metrics or something.
3d3fdd7
to
bc60d81
Compare
Passed, doing another 20 runs with the updated PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 1 of 2 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/kv/kvserver/replica_rangefeed.go
line 213 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Are we cool with using a Get here? Should we consider something else like a LeaseInfo?
Only reason to do so would be to avoid skewing metrics or something.
A Get request seems safer to me, at least in the backport.
pkg/kv/kvserver/replica_rangefeed_test.go
line 1741 at r3 (raw file):
_, _, err = db3.BarrierWithLAI(ctx, span.Key, span.EndKey) require.Error(t, err) require.IsType(t, &kvpb.RangeKeyMismatchError{}, err)
This feels potentially flaky, as we're asserting that the cache has not been updated. I guess we can wait and see, but consider at least leaving a note to a future debugger saying that if this line fails, it can be removed without undermining the purpose of the test.
The stress test is failing on some other rangefeed test, but I can't figure out why -- it doesn't really say anything in the failure log or the server logs:
I'm going to assume this is a flake and start a bors run while I dig into the log dump. |
The DistSender does not refresh its range cache for unsplittable requests. This could cause a rangefeed transaction pusher barrier request to persistently fail following a range merge if the range cache thought the barrier spanned multiple ranges. This would only resolve once the range cache was refreshed by some other request, which might never happen. This in turn would cause the rangefeed's resolved timestamp to stall. Epic: none Release note (bug fix): fixed a bug where rangefeed resolved timestamps could get stuck, continually emitting the log message "pushing old intents failed: range barrier failed, range split", typically following a range merge.
bc60d81
to
0698937
Compare
bors r+
We've seen that flake in #92451 too. |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 0698937 to blathers/backport-release-22.2-119512: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from 0698937 to blathers/backport-release-23.1-119512: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
The DistSender does not refresh its range cache for unsplittable requests. This could cause a rangefeed transaction pusher barrier request to persistently fail following a range merge if the range cache thought the barrier spanned multiple ranges. This would only resolve once the range cache was refreshed by some other request, which might never happen. This in turn would cause the rangefeed's resolved timestamp to stall.
Resolves #119536.
Resolves #119333.
Epic: none
Release note (bug fix): fixed a bug where rangefeed resolved timestamps could get stuck, continually emitting the log message "pushing old intents failed: range barrier failed, range split", typically following a range merge.