Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: clearrange/checks=true/rangeTs=true failed #104078

Closed
cockroach-teamcity opened this issue May 30, 2023 · 4 comments
Closed

roachtest: clearrange/checks=true/rangeTs=true failed #104078

cockroach-teamcity opened this issue May 30, 2023 · 4 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented May 30, 2023

roachtest.clearrange/checks=true/rangeTs=true failed with artifacts on release-23.1 @ f462a0e04526d90b004b6865938587a1bec28c21:

test artifacts and logs in: /artifacts/clearrange/checks=true/rangeTs=true/run_1
(cluster.go:2013).Run: output in run_132341.176111982_n1-10_cockroach-workload-r: ./cockroach workload run kv --concurrency=32 --duration=1h returned: context canceled
(monitor.go:127).Wait: monitor failure: monitor command failure: unexpected node event: 10: dead (exit status 7)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-28345

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels May 30, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone May 30, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label May 30, 2023
@jbowens
Copy link
Collaborator

jbowens commented May 30, 2023

Looks like a divergence in SysBytes. This is 23.1 so I think the divergence is expected, but I don't think the fataling is, looking at the PR description of #99244.

F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550  found a delta of {ContainsEstimates:0 LastUpdateNanos:1685454810162835921 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:6 SysCount:0 AbortSpanBytes:0}
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !goroutine 13316011 [running]:
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x1)
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !	github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0x89
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc000dfe1e0, {{{0xc0043baba0, 0x24}, {0x0, 0x0}, {0x5d7fdf0, 0x1}, {0x5f14a9b, 0x2}}, 0x1763f05001ffee51, ...})
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !	github.com/cockroachdb/cockroach/pkg/util/log/clog.go:262 +0x97
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x7290258, 0xc006ca6120}, 0x2, 0x4, 0x0, 0x0?, {0x5d0d24e, 0x14}, {0xc0274903b8, 0x1, ...})
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x645
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !	github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/util/log/log_channels_generated.go:848
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).checkConsistencyImpl(0xc001a12580, {0x7290258, 0xc006ca6120}, {{{}, {0xc007328b70, 0x6, 0x8}, {0x0, 0x0, 0x0}, ...}, ...})
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:225 +0xd6c
F230530 13:55:11.988297 13316011 kv/kvserver/replica_consistency.go:225 ⋮ [T1,n10,merge,s10,r881/2:‹/Table/106/1/1{52597…-70335…}›] 1550 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).CheckConsistency(0x0?, {0x7290258, 0xc006ca6120}, {{{}, {0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}, ...}, ...})
F

@jbowens jbowens added A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels May 30, 2023
@blathers-crl
Copy link

blathers-crl bot commented May 30, 2023

cc @cockroachdb/replication

@erikgrinaker
Copy link
Contributor

I think the divergence is expected, but I don't think the fataling is

This test explicitly enables the fatal assertion:

settings.Env = append(settings.Env, []string{"COCKROACH_CONSISTENCY_AGGRESSIVE=true", "COCKROACH_ENFORCE_CONSISTENT_STATS=true"}...)

We've fixed/mitigated these races on master in #99244 and #103719, but aren't planning to backport since the risk/benefit isn't justified.

These races are also more likely to get tickled by expiration leases, which were metamorphically enabled on master and release-23.1 via #103190.

You can consider disabling the fatal stats assertion on the release branches.

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels May 31, 2023
jbowens added a commit to jbowens/cockroach that referenced this issue May 31, 2023
Previously, the checks=true configuration of the clearrange roachtest would
fatal on stats divergences. Due to cockroachdb#93896, this test can fatal with a SysBytes
divergence. This is fixed on master, but the fix will not be backported to 23.1
or 22.2. Disable the enforcement of consistent stats on this branch.

Fixes cockroachdb#104078.
Informs cockroachdb#104011.
Epic: none
Release note: none
Release justification: non-production code changes
@jbowens
Copy link
Collaborator

jbowens commented May 31, 2023

Thanks, should've looked at that PR; I didn't realize it was kvnemesis specific. Created #104152.

@jbowens jbowens closed this as completed May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

3 participants