-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: fix error index in case of range split #113111
Conversation
5e9bc4d
to
e3222bd
Compare
I did briefly try the suggestion to do it in the same place where we call
To me the current fix seems less risky, so I chose to not investigate this further. |
a5fad80
to
79fc14c
Compare
That diff was missing setting Also, the newly added test has a race on updating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know which approach is preferable.
I like the approach that you have here.
If you wanted to take it one step further, you could remove the positions
param from sendPartialBatch
entirely and have the callers assign it to the response before pushing on the channel. That way, it couldn't be missed.
Also, the newly added test has a race on updating
RangeCache.db
field. I don't think that we want to add any synchronization around that just for the test in the production path, perhaps we could hide it behindbuildutil.CrdbTestBuild
? Any other ideas of going around that? We could also just skip the test under race.
Did you consider using the waitThenSwitchToSplitDesc
strategy that the TestDescriptorChangeAfterRequestSubdivision
uses? I believe that avoids the need to synchronize on the RangeCache.db
field itself.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained
This commit fixes the DistSender to set the correct error index in case an error is encountered after `divideAndSendBatchToRanges` was called recursively due to a stale range cache. In this scenario, previously we would have the error index set as if the original batch was the one right before the recursive call, which is incorrect in case that batch itself was split (e.g. because the original batch touched multiple ranges, and each sub-batch was executed in parallel). We already had the error index mapping code in place for the main code path, but we forgot to do the error index re-mapping after two recursive calls, which is now fixed. This commit additionally pulls out the logic to set `response.positions` out of `sendPartialBatch` since there are fewer places to do that one level up. This bug has been present for a very long time, but it seems relatively minor, so I decided to not include the release note. Release note: None
e50186d
to
5f8e863
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to take it one step further, you could remove the
positions
param fromsendPartialBatch
entirely and have the callers assign it to the response before pushing on the channel. That way, it couldn't be missed.
Nice, I like it, done.
Did you consider using the
waitThenSwitchToSplitDesc
strategy that theTestDescriptorChangeAfterRequestSubdivision
uses? I believe that avoids the need to synchronize on theRangeCache.db
field itself.
Oh, indeed, that works, thanks.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
Thanks a lot for the pointers and the review! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 5f8e863 to blathers/backport-release-22.2-113111: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This commit fixes the DistSender to set the correct error index in case an error is encountered after
divideAndSendBatchToRanges
was called recursively due to a stale range cache. In this scenario, previously we would have the error index set as if the original batch was the one right before the recursive call, which is incorrect in case that batch itself was split (e.g. because the original batch touched multiple ranges, and each sub-batch was executed in parallel). We already had the error index mapping code in place for the main code path, but we forgot to do the error index re-mapping after two recursive calls, which is now fixed.This commit additionally pulls out the logic to set
response.positions
out ofsendPartialBatch
since there are fewer places to do that one level up.This bug has been present for a very long time, but it seems relatively minor, so I decided to not include the release note.
Fixes: #111481.
Release note: None