Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/tpch/nodes=8 failed #98970

Closed
cockroach-teamcity opened this issue Mar 19, 2023 · 3 comments
Closed

roachtest: import/tpch/nodes=8 failed #98970

cockroach-teamcity opened this issue Mar 19, 2023 · 3 comments
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 19, 2023

roachtest.import/tpch/nodes=8 failed with artifacts on master @ 53dbb86acb1d48309530181b94838faf937084d3:

test artifacts and logs in: /artifacts/import/tpch/nodes=8/run_1
(monitor.go:127).Wait: monitor failure: monitor task failed: dial tcp 34.148.51.83:26257: connect: connection refused

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

/cc @cockroachdb/sql-sessions

This test on roachdash | Improve this report!

Jira issue: CRDB-25633

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Mar 19, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 19, 2023
@rafiss
Copy link
Collaborator

rafiss commented Mar 20, 2023

The logs for Node 3 in the previous run show disk write failed while updating node liveness, then the node crashes with file write stall detected: disk slowness detected: sync on file ‹/mnt/data1/cockroach/auxiliary/sideloading/r0XXXX/r442/i12.t6› has been ongoing for 20.0s.

I'm moving this over for @cockroachdb/storage to look at.

W230319 05:57:12.576891 240 kv/kvserver/liveness/liveness.go:890 ⋮ [T1,n3,liveness-hb] 540  slow heartbeat took 3.000536104s; err=disk write failed while updating node liveness: interrupted during singleflight ‹engine sync:0›: context deadline exceeded
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541  failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 3.001s (given timeout 3s)›: disk write failed while updating node liveness: interrupted during singleflight ‹engine sync:0›: context deadline exceeded
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +(1) ‹operation "node liveness heartbeat" timed out after 3.001s (given timeout 3s)›
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +Wraps: (2) attached stack trace
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  -- stack trace:
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1294
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | [...repeated from below...]
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +Wraps: (3) disk write failed while updating node liveness
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +Wraps: (4) attached stack trace
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  -- stack trace:
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*call).result
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:272
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.Future.WaitForResult
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:234
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1292
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).heartbeatInternal
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:965
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1.1
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:780
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:91
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:763
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | runtime.goexit
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +  | 	GOROOT/src/runtime/asm_amd64.s:1594
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +Wraps: (5) interrupted during singleflight ‹engine sync:0›
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +Wraps: (6) context deadline exceeded
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +Error types: (1) *contextutil.TimeoutError (2) *withstack.withStack (3) *errutil.withPrefix (4) *withstack.withStack (5) *errutil.withPrefix (6) context.deadlineExceededError
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +An inability to maintain liveness will prevent a node from participating in a
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +cluster. If this problem persists, it may be a sign of resource starvation or
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +of network connectivity problems. For help troubleshooting, visit:
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +
W230319 05:57:12.576985 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 541 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
W230319 05:57:15.583314 240 kv/kvserver/liveness/liveness.go:890 ⋮ [T1,n3,liveness-hb] 542  slow heartbeat took 3.004753733s; err=disk write failed while updating node liveness: interrupted during singleflight ‹engine sync:0›: context deadline exceeded
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543  failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 3.005s (given timeout 3s)›: disk write failed while updating node liveness: interrupted during singleflight ‹engine sync:0›: context deadline exceeded
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +(1) ‹operation "node liveness heartbeat" timed out after 3.005s (given timeout 3s)›
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +Wraps: (2) attached stack trace
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  -- stack trace:
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1294
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | [...repeated from below...]
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +Wraps: (3) disk write failed while updating node liveness
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +Wraps: (4) attached stack trace
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  -- stack trace:
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*call).result
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:272
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.Future.WaitForResult
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:234
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1292
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).heartbeatInternal
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:965
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1.1
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:780
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:91
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:763
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | runtime.goexit
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +  | 	GOROOT/src/runtime/asm_amd64.s:1594
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +Wraps: (5) interrupted during singleflight ‹engine sync:0›
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +Wraps: (6) context deadline exceeded
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +Error types: (1) *contextutil.TimeoutError (2) *withstack.withStack (3) *errutil.withPrefix (4) *withstack.withStack (5) *errutil.withPrefix (6) context.deadlineExceededError
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +An inability to maintain liveness will prevent a node from participating in a
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +cluster. If this problem persists, it may be a sign of resource starvation or
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +of network connectivity problems. For help troubleshooting, visit:
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +
W230319 05:57:15.583425 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 543 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
I230319 05:57:16.575210 222 kv/kvserver/store_gossip.go:134 ⋮ [T1,n3,s3,r1/5:‹/{Min-System/NodeL…}›] 544  could not gossip first range descriptor: [NotLeaseHolderError] ‹lease acquisition canceled because context canceled›; r1: replica (n3,s3):5 not lease holder; lease holder unknown
I230319 05:57:16.602390 167 gossip/client.go:129 ⋮ [T1,n3] 545  closing client to n1 (‹34.138.99.167:26257›): stopping outgoing client to n1 (‹34.138.99.167:26257›); already have incoming
W230319 05:57:18.583731 240 kv/kvserver/liveness/liveness.go:890 ⋮ [T1,n3,liveness-hb] 546  slow heartbeat took 3.000173178s; err=disk write failed while updating node liveness: interrupted during singleflight ‹engine sync:0›: context deadline exceeded
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547  failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 3s (given timeout 3s)›: disk write failed while updating node liveness: interrupted during singleflight ‹engine sync:0›: context deadline exceeded
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +(1) ‹operation "node liveness heartbeat" timed out after 3s (given timeout 3s)›
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +Wraps: (2) attached stack trace
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  -- stack trace:
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1294
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | [...repeated from below...]
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +Wraps: (3) disk write failed while updating node liveness
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +Wraps: (4) attached stack trace
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  -- stack trace:
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*call).result
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:272
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.Future.WaitForResult
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:234
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1292
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).heartbeatInternal
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:965
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1.1
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:780
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:91
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:763
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | runtime.goexit
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +  | 	GOROOT/src/runtime/asm_amd64.s:1594
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +Wraps: (5) interrupted during singleflight ‹engine sync:0›
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +Wraps: (6) context deadline exceeded
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +Error types: (1) *contextutil.TimeoutError (2) *withstack.withStack (3) *errutil.withPrefix (4) *withstack.withStack (5) *errutil.withPrefix (6) context.deadlineExceededError
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +An inability to maintain liveness will prevent a node from participating in a
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +cluster. If this problem persists, it may be a sign of resource starvation or
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +of network connectivity problems. For help troubleshooting, visit:
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +
W230319 05:57:18.583857 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 547 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
W230319 05:57:21.585010 240 kv/kvserver/liveness/liveness.go:890 ⋮ [T1,n3,liveness-hb] 548  slow heartbeat took 3.000948739s; err=context deadline exceeded
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549  failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 3.001s (given timeout 3s)›: context deadline exceeded
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +(1) ‹operation "node liveness heartbeat" timed out after 3.001s (given timeout 3s)›
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +Wraps: (2) context deadline exceeded
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +An inability to maintain liveness will prevent a node from participating in a
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +cluster. If this problem persists, it may be a sign of resource starvation or
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +of network connectivity problems. For help troubleshooting, visit:
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +
W230319 05:57:21.585076 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 549 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
I230319 05:57:23.667972 119586 kv/bulk/sst_batcher.go:864 ⋮ [T1,n3,f‹02649193›,job=849127334523699201,distsql.gateway=‹1›] 550  SSTable cannot be added spanning range bounds /Table/106/4/‹947030›/‹164655075›/‹3›, retrying...
W230319 05:57:24.586141 240 kv/kvserver/liveness/liveness.go:890 ⋮ [T1,n3,liveness-hb] 551  slow heartbeat took 3.001004298s; err=context deadline exceeded
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552  failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 3.001s (given timeout 3s)›: context deadline exceeded
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +(1) ‹operation "node liveness heartbeat" timed out after 3.001s (given timeout 3s)›
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +Wraps: (2) context deadline exceeded
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +An inability to maintain liveness will prevent a node from participating in a
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +cluster. If this problem persists, it may be a sign of resource starvation or
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +of network connectivity problems. For help troubleshooting, visit:
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +
W230319 05:57:24.586218 240 kv/kvserver/liveness/liveness.go:792 ⋮ [T1,n3,liveness-hb] 552 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
W230319 05:57:26.400549 119430 kv/kvserver/spanlatch/manager.go:559 ⋮ [T1,n3,s3,r13/2:‹/Table/1{1-2}›] 553  have been waiting 15s to acquire ‹read› latch ‹/Table/11/2/"\x80"/36/1/2023-03-19T06:02:11.609852Z/6/0@0,0›, held by ‹write› latch ‹/Table/11/2/"\x80"/36/1/2023-03-19T06:02:11.609852Z/6/[email protected],0›
W230319 05:57:26.400595 119434 kv/kvserver/spanlatch/manager.go:559 ⋮ [T1,n3,s3,r13/2:‹/Table/1{1-2}›] 554  have been waiting 15s to acquire ‹read› latch ‹/Table/11/2/"\x80"/3/1/2023-03-19T06:02:11.322441Z/6/0@0,0›, held by ‹write› latch ‹/Table/11/2/"\x80"/3/1/2023-03-19T06:02:11.322441Z/6/[email protected],0›
W230319 05:57:26.430241 119392 kv/kvserver/spanlatch/manager.go:559 ⋮ [T1,n3,s3,r13/2:‹/Table/1{1-2}›] 555  have been waiting 15s to acquire ‹read› latch ‹/Table/11/2/"\x80"/14/1/2023-03-19T06:01:58.29435Z/6/0@0,0›, held by ‹write› latch ‹/Table/11/2/"\x80"/14/1/2023-03-19T06:01:58.29435Z/6/[email protected],0›
W230319 05:57:26.430328 119388 kv/kvserver/spanlatch/manager.go:559 ⋮ [T1,n3,s3,r13/2:‹/Table/1{1-2}›] 557  have been waiting 15s to acquire ‹read› latch ‹/Table/11/2/"\x80"/7/1/2023-03-19T06:02:01.759215Z/6/0@0,0›, held by ‹write› latch ‹/Table/11/2/"\x80"/7/1/2023-03-19T06:02:01.759215Z/6/[email protected],0›
W230319 05:57:26.430294 119390 kv/kvserver/spanlatch/manager.go:559 ⋮ [T1,n3,s3,r13/2:‹/Table/1{1-2}›] 556  have been waiting 15s to acquire ‹read› latch ‹/Table/11/2/"\x80"/33/1/2023-03-19T06:02:11.214215Z/6/0@0,0›, held by ‹write› latch ‹/Table/11/2/"\x80"/33/1/2023-03-19T06:02:11.214215Z/6/[email protected],0›
I230319 05:57:27.035559 119194 1@storage/pebble.go:1210 ⋮ [T1,n3] 558  the server is terminating due to a fatal error (see the DEV channel for details)
F230319 05:57:27.035596 119194 storage/pebble.go:1210 ⋮ [T1,n3] 559  file write stall detected: disk slowness detected: sync on file ‹/mnt/data1/cockroach/auxiliary/sideloading/r0XXXX/r442/i12.t6› has been ongoing for 20.0s

@rafiss rafiss removed the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Mar 20, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Mar 20, 2023
@jbowens
Copy link
Collaborator

jbowens commented Mar 20, 2023

Took a look and appears to be a legitimate disk stall, not uncommon during import workloads that overwhelm PD.

@jbowens jbowens closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2023
@jbowens jbowens added the X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue label Mar 20, 2023
@exalate-issue-sync exalate-issue-sync bot removed the X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue label Mar 20, 2023
@jbowens jbowens added the X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue label Mar 20, 2023
@renatolabs
Copy link
Contributor

cc #97968 for tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Projects
None yet
Development

No branches or pull requests

5 participants