Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/mixed-versions failed ["show cluster setting version" timed out: value differs between local setting and KV] #129460

Closed
cockroach-teamcity opened this issue Aug 22, 2024 · 4 comments · Fixed by #129520
Assignees
Labels
branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 22, 2024

roachtest.schemachange/mixed-versions failed with artifacts on release-24.2 @ 310388de99f5b0b708a6b286581d79bb4df6f34d:

(mixedversion.go:710).Run: mixed-version test failure while running step 12 (run "run schemachange workload and validation in mixed version"): full command output in run_072406.811148213_n4_COCKROACHRANDOMSEED5.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/schemachange/mixed-versions/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-41550

@cockroach-teamcity cockroach-teamcity added branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Aug 22, 2024
@rafiss
Copy link
Collaborator

rafiss commented Aug 22, 2024

E240822 07:26:12.364841 1 workload/cli/run.go:591  [-] 3  ***UNEXPECTED ERROR; Failed to generate a random operation: ERROR: operation "show cluster setting version" timed out after 2m0.001s (given timeout 2m0s): value differs between local setting (24.2-upgrading-step--01) and KV (24.1-upgrading-to-24.2-step-012) (SQLSTATE XXUUU)
Error: ***UNEXPECTED ERROR; Failed to generate a random operation: ERROR: operation "show cluster setting version" timed out after 2m0.001s (given timeout 2m0s): value differs between local setting (24.2-upgrading-step--01) and KV (24.1-upgrading-to-24.2-step-012) (SQLSTATE XXUUU)

@rafiss rafiss changed the title roachtest: schemachange/mixed-versions failed roachtest: schemachange/mixed-versions failed ["show cluster setting version" timed out: value differs between local setting and KV] Aug 22, 2024
@rafiss rafiss self-assigned this Aug 22, 2024
@rafiss
Copy link
Collaborator

rafiss commented Aug 22, 2024

The error value differs between local setting (24.2-upgrading-step--01) and KV (24.1-upgrading-to-24.2-step-012) indicates that the local setting is set to a fence version, as evidenced by the odd internal version number. The strange thing is that it looks like the internal version is -1. That's why it got formatted with the extra dash (step--01).

I read some code and found that that's possible because of the FenceVersionFor logic:

// FenceVersionFor constructs the appropriate "fence version" for the given
// cluster version. Fence versions allow the upgrades infrastructure to safely
// step through consecutive cluster versions in the presence of Nodes (running
// any binary version) being added to the cluster. See the upgrade manager
// above for intended usage.
//
// Fence versions (and the upgrades infrastructure entirely) were introduced
// in the 21.1 release cycle. In the same release cycle, we introduced the
// invariant that new user-defined versions (users being crdb engineers) must
// always have even-numbered Internal versions, thus reserving the odd numbers
// to slot in fence versions for each cluster version. See top-level
// documentation in pkg/clusterversion for more details.
func FenceVersionFor(
ctx context.Context, cv clusterversion.ClusterVersion,
) clusterversion.ClusterVersion {
if (cv.Internal % 2) != 0 {
log.Fatalf(ctx, "only even numbered internal versions allowed, found %s", cv.Version)
}
// We'll pick the odd internal version preceding the cluster version,
// slotting ourselves right before it.
fenceCV := cv
fenceCV.Internal--
return fenceCV
}

When the internal version is 0, which it would be for all final versions, the corresponding fence version will have an internal version of -1.

The change in #99967 was meant to allow the local and KV version to be off by one fence version. But the logic in checkClusterSettingValuesAreEquivalent does not account for the -1 internal version described above.

This should be fixable with a change to checkClusterSettingValuesAreEquivalent. ccing @RaduBerinde since you've been interacting with the cluster version code a lot. Does my reasoning make sense to you? If so, my proposed fix is here: #129520

@cockroach-teamcity
Copy link
Member Author

roachtest.schemachange/mixed-versions failed with artifacts on release-24.2 @ a956ba3b4efcb80ffe9a2ca7b818094975285942:

(mixedversion.go:710).Run: mixed-version test failure while running step 13 (run "run schemachange workload and validation in mixed version"): full command output in run_070757.449659469_n4_COCKROACHRANDOMSEED5.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/schemachange/mixed-versions/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@rafiss rafiss added P-1 Issues/test failures with a fix SLA of 1 month and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 27, 2024
@craig craig bot closed this as completed in 79de4ac Aug 27, 2024
Copy link

blathers-crl bot commented Aug 27, 2024

Based on the specified backports for linked PR #129520, I applied the following new label(s) to this issue: branch-release-23.2, branch-release-24.1. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 labels Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Status: Triage
Development

Successfully merging a pull request may close this issue.

2 participants