Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/ccl/kvccl/kvtenantccl/upgradeinterlockccl/upgradeinterlockccl_test: TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances failed #121446

Closed
cockroach-teamcity opened this issue Apr 1, 2024 · 2 comments
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-testeng TestEng Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 1, 2024

pkg/ccl/kvccl/kvtenantccl/upgradeinterlockccl/upgradeinterlockccl_test.TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances failed on master @ 7fc4c7bcbbf0c75a62d056da0bf79a5a32714650:

=== RUN   TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances23545179
    test_log_scope.go:81: use -show-logs to present logs inline
    local_test_util_test.go:52: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) upgrade interlock test: running variant "lagging_binary_version", configuration: "pause_after_second_check_of_instances"
    local_test_util_test.go:176: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) creating an initial tenant server
    local_test_util_test.go:184: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) verifying the tenant version
    local_test_util_test.go:188: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) verifying basic SQL functionality
    local_test_util_test.go:193: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) verifying the version of the storage cluster
    local_test_util_test.go:200: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) upgrading the storage cluster
    local_test_util_test.go:203: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) checking the tenant after the storage cluster upgrade
    local_test_util_test.go:207: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) start upgrading the tenant
    local_test_util_test.go:262: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) upgrader is ready
    local_test_util_test.go:264: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) starting another tenant server
    local_test_util_test.go:306: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) shutting down the other tenant server
    local_test_util_test.go:315: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) waiting for the instance table to get in the right state
    local_test_util_test.go:326: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) resuming upgrade
    local_test_util_test.go:328: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) waiting for upgrade to complete
    local_test_util_test.go:330: (TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances) upgrade completed
    local_test_util_test.go:339: 
        	Error Trace:	pkg/ccl/kvccl/kvtenantccl/upgradeinterlockccl/upgradeinterlockccl_test/pkg/ccl/kvccl/kvtenantccl/upgradeinterlockccl/local_test_util_test.go:339
        	            				pkg/ccl/kvccl/kvtenantccl/upgradeinterlockccl/upgradeinterlockccl_test/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvtenantccl/upgradeinterlockccl/generated_test.go:129
        	Error:      	Error "deadline below read timestamp is nonsensical; txn has would have no chance to commit. Deadline: 1711957135.548556104,0. Read timestamp: 1711957135.568160952,0 Previous Deadline: 0,0." does not contain "preventing SQL server from starting because its binary version is too low for the tenant active version"
        	Test:       	TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances23545179
--- FAIL: TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances (52.95s)

Parameters:

  • attempt=1
  • deadlock=true
  • run=1
  • shard=10
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-37266

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-testeng TestEng Team labels Apr 1, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Apr 1, 2024
@srosenberg
Copy link
Member

srosenberg commented Apr 3, 2024

I am unable to reproduce this locally after 10k iterations,

./dev test pkg/ccl/kvccl/kvtenantccl/upgradeinterlockccl/ -f=TestTenantUpgradeInterlock_lagging_binary_version_pause_after_second_check_of_instances -v --stress --count 10000

Upon a closer examination, this appears to be a flaky test due to a potential overload while running in CI. This type of failure mode, i.e., assertion failure in UpdateDeadline, was addressed a while ago in [1]. However, it appears to have regressed in [2]. Specifically, the change(s) in [2] may have resulted in the call to UpdateDeadline with the session expiration value of 0 being passed as the (new) deadline, thereby violating the invariant.

@fqazi Do you have any idea why that may be the case? (i.e., why would the session expiration be 0)

I'm removing the release-blocker, but we should continue to investigate this flake.

[1] #99760
[2] #120006

@srosenberg srosenberg removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 3, 2024
@renatolabs renatolabs added the P-2 Issues/test failures with a fix SLA of 3 months label Apr 17, 2024
@renatolabs
Copy link
Contributor

Given the lack of activity and no further occurrences of this issue, I'm going to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-testeng TestEng Team
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants