Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/backupccl: TestBackupAndRestoreJobDescription failed #102519

Closed
cockroach-teamcity opened this issue Apr 28, 2023 · 6 comments · Fixed by #102591
Closed

ccl/backupccl: TestBackupAndRestoreJobDescription failed #102519

cockroach-teamcity opened this issue Apr 28, 2023 · 6 comments · Fixed by #102591
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-disaster-recovery
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 28, 2023

ccl/backupccl.TestBackupAndRestoreJobDescription failed with artifacts on master @ 3b1bc872bb4bf8e5eb2ad8790ba8ec781167dd1f:

Fatal error:

panic: logging already active; first use:

Stack:

goroutine 44712184 [running]:
runtime/debug.Stack()
	GOROOT/src/runtime/debug/stack.go:24 +0x65
github.com/cockroachdb/cockroach/pkg/util/log.setActive()
	github.com/cockroachdb/cockroach/pkg/util/log/clog.go:418 +0xa7
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc0001aea40, {{{0xc01001a5a0, 0x24}, {0x0, 0x0}, {0x60fd886, 0x1}, {0x60fd887, 0x1}}, 0x175a0528e13fe7c2, ...})
	github.com/cockroachdb/cockroach/pkg/util/log/clog.go:250 +0x50
github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x768e5c0, 0xc005d82960}, 0x2, 0x1, 0x0, 0x0?, {0x60b90dc, 0x1c}, {0xc003759b20, 0x1, ...})
	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x605
github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
github.com/cockroachdb/cockroach/pkg/util/log.Infof(...)
	github.com/cockroachdb/cockroach/bazel-out/k8-fastbuild/bin/pkg/util/log/log_channels_generated.go:277
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc001f48a80, 0xc00d3d10b0?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:656 +0x337
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftSchedulerShard).worker(0xc012145ef0, {0x768e5c0, 0xc0185490e0}, {0x76722e0, 0xc001f48a80}, 0xc017bbdb00)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:392 +0x195
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2({0x768e5c0?, 0xc0185490e0?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:299 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x146
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:461 +0x43b [recovered]
	panic: logging already active; first use:
goroutine 44712184 [running]:
runtime/debug.Stack()
	GOROOT/src/runtime/debug/stack.go:24 +0x65
github.com/cockroachdb/cockroach/pkg/util/log.setActive()
	github.com/cockroachdb/cockroach/pkg/util/log/clog.go:418 +0xa7
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc0001aea40, {{{0xc01001a5a0, 0x24}, {0x0, 0x0}, {0x60fd886, 0x1}, {0x60fd887, 0x1}}, 0x175a0528e13fe7c2, ...})
	github.com/cockroachdb/cockroach/pkg/util/log/clog.go:250 +0x50
github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x768e5c0, 0xc005d82960}, 0x2, 0x1, 0x0, 0x0?, {0x60b90dc, 0x1c}, {0xc003759b20, 0x1, ...})
	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x605
github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
github.com/cockroachdb/cockroach/pkg/util/log.Infof(...)
	github.com/cockroachdb/cockroach/bazel-out/k8-fastbuild/bin/pkg/util/log/log_channels_generated.go:277
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc001f48a80, 0xc00d3d10b0?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:656 +0x337
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftSchedulerShard).worker(0xc012145ef0, {0x768e5c0, 0xc0185490e0}, {0x76722e0, 0xc001f48a80}, 0xc017bbdb00)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:392 +0x195
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2({0x768e5c0?, 0xc0185490e0?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:299 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x146
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:461 +0x43b
Log preceding fatal error

=== RUN   TestBackupAndRestoreJobDescription
I230428 06:24:43.857009 39933109 kv/kvserver/store_raft.go:656  [T1,n1,s1,r35/1:/Table/3{3-4},raft] 50  raft ready handling: 0.62s [append=0.00s, apply=0.00s, , other=0.62s], wrote []; node might be overloaded
I230428 06:24:43.856679 39932836 kv/kvserver/store_raft.go:656  [T1,n1,s1,r62/1:/Table/6{1-2},raft] 51  raft ready handling: 0.65s [append=0.00s, apply=0.00s, , other=0.65s], wrote []; node might be overloaded
I230428 06:24:43.858057 39933055 kv/kvserver/store_raft.go:656  [T1,n1,s1,r64/1:/Table/106/1/{500-600},raft] 52  raft ready handling: 0.58s [append=0.00s, apply=0.00s, non-blocking-sync=0.00s, other=0.58s], wrote [append-batch=12 B, ]; node might be overloaded
W230428 06:24:43.870565 44882640 2@rpc/clock_offset.go:276  [T1,n3,rnode=1,raddr=127.0.0.1:41031,class=default,rpc] 57  latency jump (prev avg 173.69ms, current 649.55ms)
I230428 06:24:43.858391 84776319 kv/txn.go:883  [T1,n2] 53  async rollback failed: interrupted during singleflight range lookup:��liveness-:false:4: context deadline exceeded
I230428 06:24:43.866110 39932661 kv/kvserver/store_raft.go:656  [T1,n1,s1,r31/1:/{Table/29-NamespaceTab…},raft] 54  raft ready handling: 0.66s [append=0.00s, apply=0.00s, , other=0.66s], wrote []; node might be overloaded
I230428 06:24:43.850589 44876293 kv/kvserver/store_raft.go:656  [T1,n3,s3,r43/2:/Table/4{1-2},raft] 55  raft ready handling: 0.63s [append=0.00s, apply=0.00s, , other=0.63s], wrote []; node might be overloaded
I230428 06:24:43.867678 84766705 kv/txn.go:883  [T1,n3] 56  async rollback failed: aborted during DistSender.Send: context deadline exceeded
I230428 06:24:43.887509 44876709 kv/kvserver/store_raft.go:656  [T1,n3,s3,r33/3:/{NamespaceTab…-Table/32},raft] 58  raft ready handling: 0.71s [append=0.00s, apply=0.00s, , other=0.71s], wrote []; node might be overloaded
I230428 06:24:43.890592 39933104 kv/kvserver/store_raft.go:656  [T1,n1,s1,r59/1:/Table/5{8-9},raft] 59  raft ready handling: 0.68s [append=0.00s, apply=0.00s, , other=0.68s], wrote []; node might be overloaded
I230428 06:24:43.885217 86564548 gossip/client.go:129  [T1,n3] 60  closing client to n2 (127.0.0.1:41169): stopping outgoing client to n2 (127.0.0.1:41169); already have incoming
I230428 06:24:43.894955 43793186 kv/kvserver/store_raft.go:656  [T1,n1,s1,r47/1:/Table/4{5-6},raft] 61  raft ready handling: 1.04s [append=0.00s, apply=0.93s, , other=0.11s], wrote [append-batch=50 B, apply=390 B (2 in 2 batches)]; node might be overloaded
I230428 06:24:43.907264 39932828 kv/kvserver/store_raft.go:656  [T1,n1,s1,r52/1:/Table/5{1-2},raft] 62  raft ready handling: 0.75s [append=0.00s, apply=0.00s, , other=0.75s], wrote []; node might be overloaded
I230428 06:24:43.899894 43792909 kv/kvserver/store_raft.go:656  [T1,n1,s1,r50/1:/Table/{48-50},raft] 63  raft ready handling: 0.70s [append=0.00s, apply=0.05s, non-blocking-sync=0.00s, other=0.64s], wrote [append-batch=590 B, append-ent=438 B (2), apply=219 B (1)]; node might be overloaded
I230428 06:24:43.917306 43792981 kv/kvserver/store_raft.go:656  [T1,n1,s1,r51/1:/Table/5{0-1},raft] 64  raft ready handling: 0.92s [append=0.00s, apply=0.81s, , other=0.11s], wrote [append-batch=50 B, apply=209 B (1)]; node might be overloaded
I230428 06:24:43.926541 39933186 kv/kvserver/store_raft.go:656  [T1,n1,s1,r77/1:/{Table/121-Max},raft] 66  raft ready handling: 1.77s [append=0.00s, apply=0.00s, , other=1.77s], wrote []; node might be overloaded
I230428 06:24:43.912193 44712184 kv/kvserver/store_raft.go:656  [T1,n2,s2,r61/3:/Table/6{0-1},raft] 65  raft ready handling: 0.78s [append=0.00s, apply=0.00s, sync=0.00s, other=0.78s], wrote [append-batch=286 B, append-ent=223 B (1), ]; node might be overloaded
W230428 06:24:43.930123 44713195 kv/kvserver/liveness/liveness.go:810  [T1,n2,liveness-hb] 68  failed node liveness heartbeat: operation "node liveness heartbeat" timed out after 4.472s (given timeout 3s): result is ambiguous: context done during DistSender.Send: ba: ConditionalPut [/System/NodeLiveness/2,/Min), EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/2], [txn: 1808773f], [can-forward-ts] RPC error: grpc: context deadline exceeded [code 4/DeadlineExceeded]
W230428 06:24:43.930123 44713195 kv/kvserver/liveness/liveness.go:810  [T1,n2,liveness-hb] 68 +
W230428 06:24:43.930123 44713195 kv/kvserver/liveness/liveness.go:810  [T1,n2,liveness-hb] 68 +An inability to maintain liveness will prevent a node from participating in a
W230428 06:24:43.930123 44713195 kv/kvserver/liveness/liveness.go:810  [T1,n2,liveness-hb] 68 +cluster. If this problem persists, it may be a sign of resource starvation or
W230428 06:24:43.930123 44713195 kv/kvserver/liveness/liveness.go:810  [T1,n2,liveness-hb] 68 +of network connectivity problems. For help troubleshooting, visit:
W230428 06:24:43.930123 44713195 kv/kvserver/liveness/liveness.go:810  [T1,n2,liveness-hb] 68 +
W230428 06:24:43.930123 44713195 kv/kvserver/liveness/liveness.go:810  [T1,n2,liveness-hb] 68 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
I230428 06:24:43.928784 39932990 kv/kvserver/store_raft.go:656  [T1,n1,s1,r58/1:/Table/5{7-8},raft] 67  raft ready handling: 1.46s [append=0.00s, apply=0.00s, sync=0.00s, other=1.46s], wrote [append-batch=296 B, append-ent=233 B (1), ]; node might be overloaded
I230428 06:24:43.964365 44876117 kv/kvserver/store_raft.go:656  [T1,n3,s3,r42/2:/Table/4{0-1},raft] 69  raft ready handling: 1.03s [append=0.00s, apply=0.79s, , other=0.24s], wrote [append-batch=50 B, apply=438 B (2 in 2 batches)]; node might be overloaded
I230428 06:24:43.935894 39932861 kv/kvserver/store_raft.go:656  [T1,n1,s1,r32/1:/NamespaceTable/{30-Max},raft] 70  raft ready handling: 0.74s [append=0.00s, apply=0.00s, , other=0.74s], wrote []; node might be overloaded
--- FAIL: TestBackupAndRestoreJobDescription (0.13s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-27486

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-disaster-recovery labels Apr 28, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Apr 28, 2023
@dt
Copy link
Member

dt commented Apr 28, 2023

flake in logging, unrelated to backup/restore, already tracked in #102588

@dt dt closed this as completed Apr 28, 2023
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 28, 2023

initialized metamorphic constant "kv.expiration_leases_only.enabled" with value true

Expiration-based leases don't usually work great with deadlock builds. The cluster runs too slow, so they can't maintain leases/leadership/liveness. Consider skipping this test under deadlock builds.

@dt
Copy link
Member

dt commented Apr 28, 2023

Unfortunately basically all of DR's tests start a test server or test cluster and execute backup and restore statements against it, so almost all of our tests are also testing the kvserver, the sql server, and any and all metamorphic behaviors in them, so if we need to skip tests that include running any kvserver metamorphic behaviors under deadlock, then we basically need to skip all of backupccl/*, not just this test, under deadlock?

@dt
Copy link
Member

dt commented Apr 28, 2023

put another way: if testserver / testcluster by default shouldn't be tested in deadlock, then perhaps the t.Skip should be in testserver/testcluster rather than asking every test that uses it to know this?

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 28, 2023

Yeah. I'm thinking about maybe disabling kv.expiration_leases_only.enabled metamorphism in deadlock builds for now. They only seem to be failing there.

It's a bit unfortunate to lose that combination, but I think we exercise expiration leases enough under deadlock even in non-metamorphic tests.

Longer-term, maybe disabling testserver or testcluster under deadlock makes sense. But I'll just do this for now.

@erikgrinaker
Copy link
Contributor

#102591

craig bot pushed a commit that referenced this issue Apr 28, 2023
102591: kvserver: disable `kv.expiration_leases_only.enabled` metamorphism under deadlock r=erikgrinaker a=erikgrinaker

TestClusters are usually so slow under deadlock detection that they're unable to maintain leases/leadership/liveness, especially with expiration leases.

Resolves #101612.
Resolves #101613.
Resolves #101625.
Resolves #101626.
Resolves #102519.

Epic: none
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-disaster-recovery
Projects
No open projects
Archived in project
3 participants