ccl/backupccl: TestRestoreErrorPropagates failed #98037

cockroach-teamcity · 2023-03-05T07:03:46Z

ccl/backupccl.TestRestoreErrorPropagates failed with artifacts on master @ cf14ad694ee562676f53e36fa8495206c3aed61f:

=== RUN   TestRestoreErrorPropagates
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/95e138e66d69292427dfb9528cf06d04/logTestRestoreErrorPropagates1393028586
    test_log_scope.go:79: use -show-logs to present logs inline
    backup_test.go:6286: 
        	Error Trace:	/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/4704/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/ccl/backupccl/backupccl_test_/backupccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/backupccl/backup_test.go:6286
        	Error:      	Expect "pq: job-row-insert: boom 3" to match "boom 1"
        	Test:       	TestRestoreErrorPropagates
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/95e138e66d69292427dfb9528cf06d04/logTestRestoreErrorPropagates1393028586
--- FAIL: TestRestoreErrorPropagates (27.21s)

Parameters: TAGS=bazel,gss,race

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/disaster-recovery _{This test on roachdash | Improve this report!

Jira issue: CRDB-25035}

The text was updated successfully, but these errors were encountered:

98741: ci: update bazel builder image r=rickystewart a=cockroach-teamcity Release note: None Epic: None 98878: backupccl: fix occassional TestRestoreErrorPropagates flake r=stevendanna a=adityamaru Very rarely under stress race another automatic job would race with the restore and increment the error count. This would result in the count being greater than our expected value of 1. This disables all the automatic jobs eliminating the chance of this race. Fixes: #98037 Release note: None 99099: kvserver: deflake TestReplicaTombstone r=andrewbaptist a=tbg Like many other tests, this test could flake because we'd sometimes catch a "cannot remove learner while snapshot is in flight" error. I think the root cause is that sometimes there are errant Raft snapshots in the system[^1] and these get mistaken for LEARNERs that are still being caught up by the replicate queue. I tried to address this general class of issues by making the check for in-flight learner snapshots not care about *raft* snapshots. I was able to stress TestReplicaTombstone for 30+ minutes without a failure using that approach, whereas previously it usually failed within a few minutes. ``` ./dev test --stress pkg/kv/kvserver/ --filter TestReplicaTombstone 2>&1 | tee stress.log [...] 2461 runs so far, 0 failures, over 35m45s ``` [^1]: #87553 Fixes #98883. Epic: none Release note: None 99126: kv: return error on locking request in LeafTxn r=nvanbenschoten a=miraradeva Previously, as noted in #94290, it was possible for a LeafTxn to issue locking requests as part of SELECT FOR UPDATE. This behavior was unexpected and the RootTxn wasn't properly cleaning up the locks, resulting in others waiting for those locks to be released. The issue was resolved, in #94399, by ensuring non-default locking strength transactions don't use the streamer API and always run as RootTxn. This patch adds an assertion on the kv side to prevent other existing or future attempts of LeafTxn issuing locking requests. We don't expect that there are such existing cases, so we don't expect this assertion to fail, but will keep an eye on the nightly tests to make sure. Fixes: #97817 Release note: None 99150: backupccl: stop logging unsanitized backup stmt in schedule executor r=stevendanna a=msbutler Informs #99145 Release note: None Co-authored-by: cockroach-teamcity <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: Michael Butler <[email protected]>

Very rarely under stress race another automatic job would race with the restore and increment the error count. This would result in the count being greater than our expected value of 1. This disables all the automatic jobs eliminating the chance of this race. Fixes: #98037 Release note: None

cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Mar 5, 2023

cockroach-teamcity added this to the 23.1 milestone Mar 5, 2023

blathers-crl bot added the T-disaster-recovery label Mar 5, 2023

exalate-issue-sync bot assigned stevendanna and adityamaru and unassigned stevendanna Mar 15, 2023

adityamaru mentioned this issue Mar 17, 2023

backupccl: fix occassional TestRestoreErrorPropagates flake #98878

Merged

craig bot closed this as completed in f8557fc Mar 22, 2023

blathers-crl bot mentioned this issue Mar 22, 2023

release-23.1: backupccl: fix occassional TestRestoreErrorPropagates flake #99285

Merged

github-project-automation bot added this to Disaster Recovery Backlog Aug 28, 2024

github-project-automation bot moved this to Done in Disaster Recovery Backlog Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ccl/backupccl: TestRestoreErrorPropagates failed #98037

ccl/backupccl: TestRestoreErrorPropagates failed #98037

cockroach-teamcity commented Mar 5, 2023 •

edited by cockroach-jira-scripts

Loading

ccl/backupccl: TestRestoreErrorPropagates failed #98037

ccl/backupccl: TestRestoreErrorPropagates failed #98037

Comments

cockroach-teamcity commented Mar 5, 2023 • edited by cockroach-jira-scripts Loading

cockroach-teamcity commented Mar 5, 2023 •

edited by cockroach-jira-scripts

Loading