Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: kv/contention/nodes=4 failed #80823

Closed
cockroach-teamcity opened this issue Apr 30, 2022 · 8 comments · Fixed by #81796
Closed

roachtest: kv/contention/nodes=4 failed #80823

cockroach-teamcity opened this issue Apr 30, 2022 · 8 comments · Fixed by #81796
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-kv KV Team T-sql-queries SQL Queries Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 30, 2022

roachtest.kv/contention/nodes=4 failed with artifacts on master @ a2e1910f51593bd2ef72e1d7c615e08f95791186:

		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_061117.846761679_n5_workload_run_kv
		Wraps: (3) ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned
		  | stderr:
		  | I220430 06:11:19.158387 1 workload/cli/run.go:373  [-] 1  DEPRECATION: the --init flag on "workload run" will no longer be supported after 19.2
		  | I220430 06:11:19.224727 1 workload/workloadsql/workloadsql.go:136  [-] 2  starting 4 splits
		  | I220430 06:11:19.897023 1 workload/cli/run.go:414  [-] 3  creating load generator...
		  | I220430 06:11:19.962557 1 workload/cli/run.go:445  [-] 4  creating load generator... done (took 65.534113ms)
		  | I220430 06:11:20.432262 913 workload/pgx_helpers.go:79  [-] 5  pgx logger [error]: Exec logParams=map[args:[2964940846739993213 4c -8472975028287664194 b1 5840144559726473562 f1 3003351574143778344 55] err:ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505) pid:3391634 sql:kv-2]
		  | Error: ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505)
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 5. Command with error:
		  | ``````
		  | ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4}
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,kv.go:342,test_runner.go:876: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerKVContention.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/kv.go:342
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:876
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-15500

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Apr 30, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 30, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.kv/contention/nodes=4 failed with artifacts on master @ c9e0194b19a03d55c6be92572aad3fbafc256334:

		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_061037.673782570_n5_workload_run_kv
		Wraps: (3) ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned
		  | stderr:
		  | I220501 06:10:39.024837 1 workload/cli/run.go:373  [-] 1  DEPRECATION: the --init flag on "workload run" will no longer be supported after 19.2
		  | I220501 06:10:39.089626 1 workload/workloadsql/workloadsql.go:136  [-] 2  starting 4 splits
		  | I220501 06:10:39.902783 1 workload/cli/run.go:414  [-] 3  creating load generator...
		  | I220501 06:10:39.971790 1 workload/cli/run.go:445  [-] 4  creating load generator... done (took 69.007092ms)
		  | I220501 06:10:40.422265 873 workload/pgx_helpers.go:79  [-] 5  pgx logger [error]: Exec logParams=map[args:[-9075366310165454498 15 -6464583208428413334 45 7126266934615051019 22 -7622296216717729017 72] err:ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505) pid:2702466 sql:kv-2]
		  | Error: ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505)
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 5. Command with error:
		  | ``````
		  | ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4}
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,kv.go:342,test_runner.go:876: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerKVContention.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/kv.go:342
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:876
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.kv/contention/nodes=4 failed with artifacts on master @ 7a9eb906ce86e2f75db637e29d46cd6604fca7b4:

		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_060953.210658924_n5_workload_run_kv
		Wraps: (3) ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned
		  | stderr:
		  | I220502 06:09:54.499907 1 workload/cli/run.go:373  [-] 1  DEPRECATION: the --init flag on "workload run" will no longer be supported after 19.2
		  | I220502 06:09:54.567765 1 workload/workloadsql/workloadsql.go:136  [-] 2  starting 4 splits
		  | I220502 06:09:55.242398 1 workload/cli/run.go:414  [-] 3  creating load generator...
		  | I220502 06:09:55.304076 1 workload/cli/run.go:445  [-] 4  creating load generator... done (took 61.678858ms)
		  | I220502 06:09:55.630847 851 workload/pgx_helpers.go:79  [-] 5  pgx logger [error]: Exec logParams=map[args:[2964940846739993213 69 -8472975028287664194 52 5840144559726473562 0b 3003351574143778344 46] err:ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505) pid:3309338 sql:kv-2]
		  | Error: ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505)
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 5. Command with error:
		  | ``````
		  | ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4}
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,kv.go:342,test_runner.go:876: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerKVContention.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/kv.go:342
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:876
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@nvanbenschoten
Copy link
Member

This is a pretty clear regression on master. Since the failure rate is so high but I don't see something obvious in the git log, I'll jump right to bisecting.

@nvanbenschoten
Copy link
Member

I bisected this conclusively to 9f02be7, which just landed on master and was not backported.

Assigning to @yuzefovich.

@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label May 2, 2022
@nvanbenschoten nvanbenschoten removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label May 2, 2022
@yuzefovich
Copy link
Member

Thanks Nathan, I'll take a look.

@cockroach-teamcity
Copy link
Member Author

roachtest.kv/contention/nodes=4 failed with artifacts on master @ 01572daaf94f80f81f843723a8b58d80fe128990:

		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_061132.895336069_n5_workload_run_kv
		Wraps: (3) ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned
		  | stderr:
		  | I220503 06:11:34.225397 1 workload/cli/run.go:373  [-] 1  DEPRECATION: the --init flag on "workload run" will no longer be supported after 19.2
		  | I220503 06:11:34.289564 1 workload/workloadsql/workloadsql.go:136  [-] 2  starting 4 splits
		  | I220503 06:11:34.630042 1 workload/cli/run.go:414  [-] 3  creating load generator...
		  | I220503 06:11:34.696341 1 workload/cli/run.go:445  [-] 4  creating load generator... done (took 66.300071ms)
		  | I220503 06:11:35.107081 927 workload/pgx_helpers.go:79  [-] 5  pgx logger [error]: Exec logParams=map[args:[-1562809780010725617 82 -3638783794381887781 23 -7099029285357836528 72 2434822016396116439 ba] err:ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505) pid:1890621 sql:kv-2]
		  | Error: ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505)
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 5. Command with error:
		  | ``````
		  | ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4}
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,kv.go:342,test_runner.go:876: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerKVContention.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/kv.go:342
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:876
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@yuzefovich
Copy link
Member

@nvanbenschoten I wonder whether off the top of your head you can say that this error is likely (or expected) to occur when P99 latency gets into 10 seconds range?

My hypothesis is that because the streamer currently doesn't support transparent refresh mechanism (i.e. automatic retries) and uses the leaf txns, the likelihood that that the write txn needs to be retried significantly increased with 9f02be7. Once P99 gets into 10-20 second range, I see the error occur locally.

@cockroach-teamcity
Copy link
Member Author

roachtest.kv/contention/nodes=4 failed with artifacts on master @ b01e1f3b3db311603dc7dd63647a8a737602143e:

		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_061002.966070528_n5_workload_run_kv
		Wraps: (3) ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned
		  | stderr:
		  | I220504 06:10:04.265907 1 workload/cli/run.go:373  [-] 1  DEPRECATION: the --init flag on "workload run" will no longer be supported after 19.2
		  | I220504 06:10:04.326687 1 workload/workloadsql/workloadsql.go:136  [-] 2  starting 4 splits
		  | I220504 06:10:04.994964 1 workload/cli/run.go:414  [-] 3  creating load generator...
		  | I220504 06:10:05.071907 1 workload/cli/run.go:445  [-] 4  creating load generator... done (took 76.943226ms)
		  | I220504 06:10:05.209428 810 workload/pgx_helpers.go:79  [-] 5  pgx logger [error]: Exec logParams=map[args:[-8617536109860315868 76 1464797858451214926 5d -2967608882202863871 ff 1139302971275172453 36] err:ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505) pid:1858984 sql:kv-2]
		  | Error: ERROR: duplicate key value violates unique constraint "kv_pkey" (SQLSTATE 23505)
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 5. Command with error:
		  | ``````
		  | ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4}
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,kv.go:342,test_runner.go:876: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerKVContention.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/kv.go:342
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:876
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue May 4, 2022
80967: sql: disable streamer for lookup joins for now r=yuzefovich a=yuzefovich

There is some fallout in the roachtests from using the streamer for the
lookup join without ordering, so let's disable it for now while we're
figuring things out.

Informs: #80823.
Informs: #79870 (comment).

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
@craig craig bot closed this as completed in 321e966 May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-kv KV Team T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants