Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: clearrange/zfs/checks=true failed #108726

Closed
cockroach-teamcity opened this issue Aug 14, 2023 · 4 comments · Fixed by #108786
Closed

roachtest: clearrange/zfs/checks=true failed #108726

cockroach-teamcity opened this issue Aug 14, 2023 · 4 comments · Fixed by #108786
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 14, 2023

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ c13bf7633cbb416d9e43f8c57b1e309fab1110ce:

(cluster.go:2279).Run: output in run_165250.783169826_n1-10_cockroach-workload-r: ./cockroach workload run kv --concurrency=32 --duration=1h --tolerate-errors returned: context canceled
(monitor.go:140).Wait: monitor failure: unexpected node event: n5: cockroach process died (exit code 7)
(test_runner.go:1122).func1: 1 dead node(s) detected
test artifacts and logs in: /artifacts/clearrange/zfs/checks=true/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-30614

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team labels Aug 14, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Aug 14, 2023
@itsbilal
Copy link
Member

n5 crashed due to a replica inconsistency in system bytes:

F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726  found a delta of {ContainsEstimates:0 LastUpdateNanos:1692033317996514232 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:6 SysCount:0 AbortSpanBytes:0}
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !goroutine 4025617 [running]:
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/allstacks.GetWithBuf({0x0?, 0xc027dc8690?, 0x46c69f?})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/util/allstacks/allstacks.go:32 +0x85
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/allstacks.Get(...)
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/util/allstacks/allstacks.go:19
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc000f77520, {{{0xc003e5e000, 0x24}, {0x623503d, 0x1}, {0x6235039, 0x1}, {0x623503d, 0x1}}, 0x177b4f5b880cf957, ...})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/util/log/clog.go:276 +0x99
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x78bacb8, 0xc004cc5aa0}, 0x2, 0x4, 0x0, 0x0?, {0x61bdefe, 0x14}, {0xc03536c3b8, 0x1, ...})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x645
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/util/log/log_channels_generated.go:848
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).checkConsistencyImpl(0xc007b0b200, {0x78bacb8, 0xc004cc5aa0}, {{{}, {0xc00d328698, 0x7, 0x8}, {0x0, 0x0, 0x0}, ...}, ...})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:225 +0xd6c
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).CheckConsistency(0x0?, {0x78bacb8, 0xc004cc5aa0}, {{{}, {0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}, ...}, ...})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:84 +0x110
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*consistencyQueue).process(0xc000e81a70, {0x78bacb8, 0xc004cc5aa0}, 0xc007b0b200, {0x0?, 0x0?})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/consistency_queue.go:185 +0x1b0
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*mergeQueue).process(0xc000e817e8, {0x78bacb8, 0xc004cc5aa0}, 0xc007b0b200, {0x7f00efb4cd20, 0xc002404ea0})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/merge_queue.go:417 +0x1693
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1({0x78bacb8, 0xc004cc5aa0})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:1018 +0x275
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/timeutil.RunWithTimeout({0x78bacf0?, 0xc022189710?}, {0xc024a28e80, 0x1f}, 0xdf8475800, 0xc005bd5e20)
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/util/timeutil/timeout.go:29 +0xdb
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc000c53340, {0x78bacf0, 0xc0221896e0}, {0x790a5a0, 0xc007b0b200})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:977 +0x48e
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processLoop.func2.1({0x78bacf0, 0xc0221896e0})
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:888 +0x117
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 ! github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x146
F230814 17:15:39.550923 4025617 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n5,merge,s5,r790/1:‹/Table/106/1/40{758773-906493}›] 726 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx

@jbowens
Copy link
Collaborator

jbowens commented Aug 15, 2023

Possibly related to #93896?

@itsbilal
Copy link
Member

Based on a conversation with the KV team on internal slack, stats deltas in sysbytes are normal and expected due to non-determinism and races with lease acquisitions. Since clearrange is the only roachtest that asserts on MVCC stats mismatches anywhere in CockroachDB, I'll update this roachtest to not fail on stats mismatches for now.

@itsbilal itsbilal removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 15, 2023
@itsbilal
Copy link
Member

@jbowens yep it is. For some reason it also didn't notify me of your comment until I sent mine 😬 something is weirdly messed up about just this issue and Github notifications for me.

craig bot pushed a commit that referenced this issue Aug 16, 2023
108673: kvstreamer: minor cleanup and unification r=yuzefovich a=yuzefovich

This commit performs some minor cleanup to unify the code a bit between GetResp and ScanResp. The only change that is not a noop is the fact that we're now `nil`ing out `ResumeSpan` on GetResponses as well as on ScanResponses when SingleRowLookup hint is `false`. Originally, we were unsetting it for all ScanResponses but in 6343df3 this was lost making the behavior different based on the hint; GetResponses never had this `nil`ing out in the first place. The rationale for actually unsetting the ResumeSpan for both types of the responses is somewhat weak (not confusing the Streamer's user (which doesn't actually inspect the ResumeSpan field) as well as to allow for GC of the keys sooner), but it's better for this behavior to be unified.

Epic: None

Release note: None

108786: roachtest: allow stats mismatches in clearrange roachtest r=RaduBerinde a=itsbilal

Previously, the clearrange roachtest was the only place anywhere in the CockroachDB codebase where we would assert on MVCC stats matching between replicas. This would trip up and fail the clearrange roachtest even in known cases of MVCC stats mismatches. This change removes the code to assert on stats mismatches with consistency checks, but retains the clearrange roachtest's use of aggressive consistency checks, so mismatches in checksums computed on data in each replica will continue to fatal the test.

Related to #93896.

Fixes #108726.

Epic: none

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
@craig craig bot closed this as completed in 96a031e Aug 16, 2023
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants