Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: clearrange/checks=true failed #84799

Closed
cockroach-teamcity opened this issue Jul 21, 2022 · 4 comments
Closed

roachtest: clearrange/checks=true failed #84799

cockroach-teamcity opened this issue Jul 21, 2022 · 4 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team X-unactionable This was closed because it was unactionable.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 21, 2022

roachtest.clearrange/checks=true failed with artifacts on master @ 457d724622e4fa2e62d6f4e7926509dbc7d18511:

		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1121.0s        0          537.7         3995.7      1.8      2.5      3.9      7.6 write
		  |  1122.0s        0          547.5         3992.6      1.8      2.2      3.5      6.3 write
		  |  1123.0s        0          422.7         3989.4      1.9      3.1     10.5     75.5 write
		  |  1124.0s        0          415.9         3986.2      1.9      3.1      5.8     96.5 write
		  |  1125.0s        0           64.0         3982.7      2.0      4.7    104.9    151.0 write
		  |  1126.0s        0            0.0         3979.2      0.0      0.0      0.0      0.0 write
		  | I220721 08:28:42.021053 236 workload/pgx_helpers.go:79  [-] 4  pgx logger [error]: Exec logParams=map[args:[4508057783916293699 19] err:ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate] (SQLSTATE 40003) pid:11352090 sql:kv-2 time:31.876444989s]
		  | I220721 08:28:42.020497 360 workload/pgx_helpers.go:79  [-] 5  pgx logger [error]: Exec logParams=map[args:[3327192390061538303 49] err:ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate] (SQLSTATE 40003) pid:10787405 sql:kv-2 time:31.874449289s]
		  | I220721 08:28:42.020433 229 workload/pgx_helpers.go:79  [-] 3  pgx logger [error]: Exec logParams=map[args:[7610128679414974537 42] err:ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [exhausted] (SQLSTATE 40003) pid:10758870 sql:kv-2 time:2.206534552s]
		  | I220721 08:28:42.020534 241 workload/pgx_helpers.go:79  [-] 7  pgx logger [error]: Exec logParams=map[args:[-8566536107696145972 af] err:ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate] (SQLSTATE 40003) pid:11311104 sql:kv-2 time:31.89745499s]
		  | I220721 08:28:42.020520 363 workload/pgx_helpers.go:79  [-] 6  pgx logger [error]: Exec logParams=map[args:[-9089365450409631843 92] err:ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate] (SQLSTATE 40003) pid:10627904 sql:kv-2 time:31.219935764s]
		  | I220721 08:28:42.020534 369 workload/pgx_helpers.go:79  [-] 8  pgx logger [error]: Exec logParams=map[args:[-9130718193216845063 75] err:ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate] (SQLSTATE 40003) pid:10898106 sql:kv-2 time:31.898524055s]
		  | I220721 08:28:42.022296 240 workload/pgx_helpers.go:79  [-] 9  pgx logger [error]: Exec logParams=map[args:[4570118766385747979 4e] err:ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate] (SQLSTATE 40003) pid:11418018 sql:kv-2 time:31.839656377s]
		  | Error: ERROR: result is ambiguous: error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate] (SQLSTATE 40003)
		  | COMMAND_PROBLEM: exit status 1
		Wraps: (4) secondary error attachment
		  | COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach workload run kv --concurrency=32 --duration=1h
		  |   | ``````
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,clearrange.go:206,clearrange.go:39,test_runner.go:896: monitor failure: monitor task failed: pq: query execution canceled due to statement timeout
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:206
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:39
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor task failed
		Wraps: (5) pq: query execution canceled due to statement timeout
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *pq.Error

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-17872

Epic CRDB-16238

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 21, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Jul 21, 2022
@nicktrav nicktrav self-assigned this Jul 21, 2022
@nicktrav nicktrav added the T-storage Storage Team label Jul 21, 2022
@nicktrav
Copy link
Collaborator

This one looks like a block device issue on n1 that resulted in a crash of that node:

F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975  disk stall detected: unable to sync log files within 20s
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !goroutine 6364684 [running]:
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x1)
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !	github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0x8a
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc000e59060, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, 0x1703cb311f61446f, ...})
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !	github.com/cockroachdb/cockroach/pkg/util/log/clog.go:239 +0x97
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x68386c8, 0xc000076088}, 0x2, 0x4, 0x1, 0x1, {0x51623e1, 0x37}, {0xc005873fc0, 0x1, ...})
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x645
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !github.com/cockroachdb/cockroach/pkg/util/log.shoutfDepth(...)
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:113
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !github.com/cockroachdb/cockroach/pkg/util/log.loggerOps.Shoutf(...)
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !	github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/util/log/log_channels_generated.go:1422
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !github.com/cockroachdb/cockroach/pkg/util/log.(*fileSink).flushAndMaybeSyncLocked.func1()
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !	github.com/cockroachdb/cockroach/pkg/util/log/file.go:258 +0x96
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !created by time.goFunc
F220721 08:34:59.642778 6364684 1@util/log/file.go:258 ⋮ [-] 2975 !	GOROOT/src/time/sleep.go:180 +0x31

Removing the release-blocker label, but we probably want to dig into why this roachtest and its other variant failed in the same run with the same failure mode (see #84796).

@nicktrav nicktrav added T-storage Storage Team and removed T-storage Storage Team release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 21, 2022
@nicktrav
Copy link
Collaborator

Kicked off a manual run (logs).

@nicktrav
Copy link
Collaborator

Hit some roachtest issues, and had to run the test again at a more recent SHA. The run succeeded, so, like #84796, I'm hoping this was just a hardware thing.

Will leave this open for a few more days, just in case it collects some more failures.

@nicktrav
Copy link
Collaborator

Will leave this open for a few more days, just in case it collects some more failures.

Given this hasn't fired again in the past few days, I'm going to close this.

@nicktrav nicktrav added the X-unactionable This was closed because it was unactionable. label Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team X-unactionable This was closed because it was unactionable.
Projects
None yet
Development

No branches or pull requests

2 participants