-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.2: kv: split LockConflictError, revive WriteIntentError over wire #113780
Merged
nvanbenschoten
merged 1 commit into
release-23.2
from
blathers/backport-release-23.2-113646
Nov 3, 2023
Merged
release-23.2: kv: split LockConflictError, revive WriteIntentError over wire #113780
nvanbenschoten
merged 1 commit into
release-23.2
from
blathers/backport-release-23.2-113646
Nov 3, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes #113271. This commit resolves the backwards incompatibility introduced by 350dc60 when `WriteIntentError` was renamed to `LockConflictError`. This rename broke mixed-version compatibility, because error details in `kvpb.Error` are packaged into an `errorspb.EncodedError`, which internally uses a `protobuf/types.Any`. `protobuf/types.Any` encodes the error's name as a string, relying on the receiving node having a matching type in order to decode the error. Without this, we saw the following logs on v23.1 nodes. ``` error while unmarshalling error: ‹any: message type "cockroach.kv.kvpb.LockConflictError" isn't linked ``` As a result, error handling for requests that used `WaitPolicy_Error` was broken. This commit resolves the issue by re-introducing `WriteIntentError` over the wire, so that v23.1 and v23.2 nodes still use the same name to refer to the same error. It does so without reverting 350dc60 and losing the naming improvement in most of the code by splitting `LockConflictError` into its two distinct roles. `LockConflictError` remains in the kvserver to communicate locking conflicts between batch evaluation and concurrency handling. However, the smaller role of communicating locking conflicts to clients that use a `WaitPolicy_Error`, a lock timeout, or a maximum wait-queue length is split into a "new" error called `WriteIntentError`. Splitting these errors was a cleanup we wanted to do anyway, so this commit just does it now to fix the bug. The unfortunate naming of `WriteIntentError` is a battle that we can fight another day. While this commit doesn't introduce any new tests, we have sufficient testing of the two uses of `WriteIntentError` for single-version clusters in the unit tests. For mixed-version clusters, we have the `backup-restore/mixed-version` roachtest, which caught the bug and exercises backup's use of `WriteIntentError`. The remaining place where this broke mixed-version compatibility was `SELECT FOR UPDATE NOWAIT`. We should add mixed-version testing for all `SELECT FOR UPDATE` variants. In the meantime, I have manually verified that the following script works on a mixed-version cluster: ``` roachprod create nathan-113271 -n3 roachprod stage nathan-113271 release v23.1.11 roachprod start nathan-113271 roachprod stop nathan-113271:2-3 roachprod put nathan-113271:2-3 cockroach # with this commit roachprod start nathan-113271:2-3 --sequential=false roachprod sql nathan-113271:1 roachprod sql nathan-113271:2 -- either shell create table t(i int primary key); insert into t values (1); select lease_holder from [show ranges from table t with details]; alter table t scatter; -- 23.2 shell begin; select * from t for update; -- 23.1 shell begin; select * from t for update nowait; -- if broken: ERROR: conflicting locks on /Table/104/1/1/0 [reason=wait_policy] -- if fixed: ERROR: could not obtain lock on row (i)=(1) in t@t_pkey -- same thing but in opposite direction, with 23.1 leaseholder and 23.2 gateway ``` Release note: None
blathers-crl
bot
force-pushed
the
blathers/backport-release-23.2-113646
branch
from
November 3, 2023 20:04
1afc563
to
d777b31
Compare
blathers-crl
bot
added
the
blathers-backport
This is a backport that Blathers created automatically.
label
Nov 3, 2023
blathers-crl
bot
force-pushed
the
blathers/backport-release-23.2-113646
branch
from
November 3, 2023 20:04
ae8e886
to
48c0dc6
Compare
blathers-crl
bot
requested review from
rharding6373
and removed request for
a team
November 3, 2023 20:04
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
blathers-crl
bot
requested review from
arulajmani,
DrewKimball and
nvanbenschoten
November 3, 2023 20:04
blathers-crl
bot
added
the
backport
Label PR's that are backports to older release branches
label
Nov 3, 2023
arulajmani
approved these changes
Nov 3, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport
Label PR's that are backports to older release branches
blathers-backport
This is a backport that Blathers created automatically.
O-robot
Originated from a bot.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #113646 on behalf of @nvanbenschoten.
/cc @cockroachdb/release
Fixes #113271.
This commit resolves the backwards incompatibility introduced by 350dc60 when
WriteIntentError
was renamed toLockConflictError
. This rename broke mixed-version compatibility, because error details inkvpb.Error
are packaged into anerrorspb.EncodedError
, which internally uses aprotobuf/types.Any
.protobuf/types.Any
encodes the error's name as a string, relying on the receiving node having a matching type in order to decode the error.Without this, we saw the following logs on v23.1 nodes.
As a result, error handling for requests that used
WaitPolicy_Error
was broken.This commit resolves the issue by re-introducing
WriteIntentError
over the wire, so that v23.1 and v23.2 nodes still use the same name to refer to the same error. It does so without reverting 350dc60 and losing the naming improvement in most of the code by splittingLockConflictError
into its two distinct roles.LockConflictError
remains in the kvserver to communicate locking conflicts between batch evaluation and concurrency handling. However, the smaller role of communicating locking conflicts to clients that use aWaitPolicy_Error
, a lock timeout, or a maximum wait-queue length is split into a "new" error calledWriteIntentError
. Splitting these errors was a cleanup we wanted to do anyway, so this commit just does it now to fix the bug. The unfortunate naming ofWriteIntentError
is a battle that we can fight another day.While this commit doesn't introduce any new tests, we have sufficient testing of the two uses of
WriteIntentError
for single-version clusters in the unit tests. For mixed-version clusters, we have thebackup-restore/mixed-version
roachtest, which caught the bug and exercises backup's use ofWriteIntentError
.The remaining place where this broke mixed-version compatibility was
SELECT FOR UPDATE NOWAIT
. We should add mixed-version testing for allSELECT FOR UPDATE
variants. In the meantime, I have manually verified that the following script works on a mixed-version cluster:Release note: None
Release justification: avoids unexpected errors during mixed version backups.