Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: bump ReplicaChecksumVersion, disable cross v23.2/v24.1 consistency checks #117304

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Jan 3, 2024

Informs #101938.
Fixes #117302.

This commit bumps the ReplicaChecksumVersion to disable replica consistency checks between v23.2 and v24.1 nodes when in a mixed-version cluster. This avoids the backwards incompatibility discussed in #117302.

While here and permitted to change the replica consistency check logic, we unset the Synthetic flag from RangeAppliedState.RaftClosedTimestamp during stats-only consistency checks. This form of consistency check is rarely used, but this prevents it from causing trouble with #101938.

While here and allowed to change the consistency check hash computation, we also switch from using the LegacyTimestamp encoding to the Timestamp encoding for the hash contribution of MVCCKeys.

Release note: None

Copy link

blathers-crl bot commented Jan 3, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten added the backport-23.2.x Flags PRs that need to be backported to 23.2. label Jan 3, 2024
@nvanbenschoten
Copy link
Member Author

@erikgrinaker @pav-kv I'd like to get your input on this option for resolving #117302. The benefit is that it's simple and it doesn't require a revert of #105523 1. The downside is that it uses the ReplicaChecksumVersion mechanism to disallow cross v23.1/v23.2 consistency checks, so we'll lose some consistency check coverage in mixed-version clusters that are running multiple binary versions. I'm leaning towards this being the right approach because clusters aren't meant to hang around in this state for a long time (just during the rolling restart), so the simplicity argument would win out. Curious to get your thoughts.

In the meantime, I'll run this with #117294 for a few dozen iterations overnight to confirm that it resolves the inconsistency.

Footnotes

  1. this would push back https://github.com/cockroachdb/cockroach/issues/101938 by a major releases and make the eventual migration more involved. If we never wanted to bump ReplicaChecksumVersion then we'd need to add a targeted version gate to consistency check requests to determine whether v24.1 nodes should ignore the synthetic bit in mvcc keys or not. Only in v24.2 could we then remove the field from the proto.

@erikgrinaker
Copy link
Contributor

This seems fine to me. It's why ReplicaChecksumVersion exists in the first place.

You've verified that we don't have any further dependencies on Synthetic here?

@pav-kv
Copy link
Collaborator

pav-kv commented Jan 4, 2024

SGTM

@pav-kv
Copy link
Collaborator

pav-kv commented Jan 4, 2024

Since we're bumping the version anyway, do we want to do any drive-by changes/refactorings to the hash computation? For example, #117302 mentions we're using ToLegacyTimestamp. Can we untangle from something named "legacy"?

…y checks

Informs cockroachdb#101938.

This commit bumps the ReplicaChecksumVersion to disable replica consistency
checks between v23.2 and v24.1 nodes when in a mixed-version cluster. This
avoids the backwards incompatibility discussed in cockroachdb#117302.

While here and permitted to change the replica consistency check logic, we
unset the Synthetic flag from RangeAppliedState.RaftClosedTimestamp during
stats-only consistency checks. This form of consistency check is rarely used,
but this prevents it from causing trouble with cockroachdb#101938.

While here and allowed to change the consistency check hash computation, we
also switch from using the LegacyTimestamp encoding to the Timestamp encoding
for the hash contribution of MVCCKeys.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/bumpReplConVersion branch from ca1c61b to 63ca265 Compare January 4, 2024 21:36
@nvanbenschoten nvanbenschoten changed the title kv: bump ReplicaChecksumVersion, disable cross v23.1/v23.2 consistency checks kv: bump ReplicaChecksumVersion, disable cross v23.2/v24.1 consistency checks Jan 4, 2024
@nvanbenschoten nvanbenschoten marked this pull request as ready for review January 5, 2024 21:51
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner January 5, 2024 21:51
@nvanbenschoten nvanbenschoten requested a review from a team January 5, 2024 21:51
@nvanbenschoten nvanbenschoten removed the backport-23.2.x Flags PRs that need to be backported to 23.2. label Jan 5, 2024
@nvanbenschoten
Copy link
Member Author

@erikgrinaker and @pav-kv, I've updated this PR with your suggestions. Now that #117324 has landed and all of the synthetic timestamp migration will land in v24.1, #117302 is no longer an issue for v23.1/v23.2 mixed-version clusters, but it will still be an issue for the eventual v23.2/v24.1 state. This PR will avoid that, so I'm planning to land it just on master.

PTAL when you get a chance.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)

@nvanbenschoten
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 8, 2024

This PR was included in a batch that timed out, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Jan 8, 2024

Build succeeded:

@craig craig bot merged commit 5bd4ca0 into cockroachdb:master Jan 8, 2024
13 checks passed
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/bumpReplConVersion branch January 9, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: replica inconsistency false positive in mixed-version cluster with global table
4 participants