Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [overload,closed ts regressing from X to Y] #61981

Closed
cockroach-teamcity opened this issue Mar 14, 2021 · 31 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@bdff5338ca725bf1cfddf7e3f648bbf02ab42999:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2688,tpcc.go:785,tpcc.go:617,test_runner.go:767: monitor failure: monitor task failed: failed with output "I210314 10:02:42.292720 1 workload/cli/run.go:361  [-] 1  creating load generator...\nInitializing 5000 connections...\nInitializing 5000 workers and preparing statements...\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2  Attempt to create load generator failed. It's been more than 1h0m0s since we started trying to create the load generator so we're giving up. Last failure: failed to initialize the load generator: preparing \nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tUPDATE district\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tSET d_next_o_id = d_next_o_id + 1\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tWHERE d_w_id = $1 AND d_id = $2\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tRETURNING d_tax, d_next_o_id: context deadline exceeded\nError: failed to initialize the load generator: preparing \n\t\tUPDATE district\n\t\tSET d_next_o_id = d_next_o_id + 1\n\t\tWHERE d_w_id = $1 AND d_id = $2\n\t\tRETURNING d_tax, d_next_o_id: context deadline exceeded\nError: COMMAND_PROBLEM: exit status 1\n(1) COMMAND_PROBLEM\nWraps: (2) Node 4. Command with error:\n  | ```\n  | ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}\n  | ```\nWraps: (3) exit status 1\nError types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError\n": /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2775164-1615705178-42-n12cpu4-geo:4 -- ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}: exit status 20
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:785
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2732
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.loadTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:710
		  | [...repeated from below...]
		Wraps: (6) failed with output "I210314 10:02:42.292720 1 workload/cli/run.go:361  [-] 1  creating load generator...\nInitializing 5000 connections...\nInitializing 5000 workers and preparing statements...\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2  Attempt to create load generator failed. It's been more than 1h0m0s since we started trying to create the load generator so we're giving up. Last failure: failed to initialize the load generator: preparing \nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tUPDATE district\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tSET d_next_o_id = d_next_o_id + 1\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tWHERE d_w_id = $1 AND d_id = $2\nE210314 11:04:39.400945 1 workload/cli/run.go:384  [-] 2 +\t\tRETURNING d_tax, d_next_o_id: context deadline exceeded\nError: failed to initialize the load generator: preparing \n\t\tUPDATE district\n\t\tSET d_next_o_id = d_next_o_id + 1\n\t\tWHERE d_w_id = $1 AND d_id = $2\n\t\tRETURNING d_tax, d_next_o_id: context deadline exceeded\nError: COMMAND_PROBLEM: exit status 1\n(1) COMMAND_PROBLEM\nWraps: (2) Node 4. Command with error:\n  | ```\n  | ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}\n  | ```\nWraps: (3) exit status 1\nError types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError\n"
		Wraps: (7) attached stack trace
		  -- stack trace:
		  | main.execCmdWithBuffer
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:566
		  | main.(*cluster).RunWithBuffer
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2353
		  | main.loadTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:709
		  | main.runTPCCBench.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:783
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2666
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2775164-1615705178-42-n12cpu4-geo:4 -- ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *withstack.withStack (8) *errutil.withPrefix (9) *exec.ExitError

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 14, 2021
@irfansharif irfansharif self-assigned this Mar 15, 2021
@irfansharif
Copy link
Contributor

Here's this test's history:

image

The same analysis as #61973 (comment) applies. (tl;dr #59992 made things worse ~Feb 15; #61777 improved things ~ Mar 13th). As for the particular failure above:

Attempt to create load generator failed. It's been more than 1h0m0s since we started trying to create the load generator so we're giving up. Last failure: failed to initialize the load generator

Seems unrelated to everything?

@irfansharif irfansharif changed the title roachtest: tpccbench/nodes=9/cpu=4/multi-region failed roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [attempt to create load generator failed] Mar 15, 2021
@irfansharif
Copy link
Contributor

Same as #61181? @nvanbenschoten, know what's up?

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@597e4a8c487e3c23d64885563d608a692b59055c:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2688,tpcc.go:785,tpcc.go:617,test_runner.go:768: monitor failure: monitor task failed: failed with output "I210317 09:43:36.117058 1 workload/cli/run.go:361  [-] 1  creating load generator...\nInitializing 5000 connections...\nInitializing 5000 workers and preparing statements...\nE210317 10:43:36.435345 1 workload/cli/run.go:384  [-] 2  Attempt to create load generator failed. It's been more than 1h0m0s since we started trying to create the load generator so we're giving up. Last failure: failed to initialize the load generator: preparing \nE210317 10:43:36.435345 1 workload/cli/run.go:384  [-] 2 +\t\tINSERT INTO \"order\" (o_id, o_d_id, o_w_id, o_c_id, o_entry_d, o_ol_cnt, o_all_local)\nE210317 10:43:36.435345 1 workload/cli/run.go:384  [-] 2 +\t\tVALUES ($1, $2, $3, $4, $5, $6, $7): context deadline exceeded\nError: failed to initialize the load generator: preparing \n\t\tINSERT INTO \"order\" (o_id, o_d_id, o_w_id, o_c_id, o_entry_d, o_ol_cnt, o_all_local)\n\t\tVALUES ($1, $2, $3, $4, $5, $6, $7): context deadline exceeded\nError: COMMAND_PROBLEM: exit status 1\n(1) COMMAND_PROBLEM\nWraps: (2) Node 4. Command with error:\n  | ```\n  | ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}\n  | ```\nWraps: (3) exit status 1\nError types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError\n": /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2785149-1615960994-40-n12cpu4-geo:4 -- ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}: exit status 20
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:785
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:768
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2732
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.loadTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:710
		  | [...repeated from below...]
		Wraps: (6) failed with output "I210317 09:43:36.117058 1 workload/cli/run.go:361  [-] 1  creating load generator...\nInitializing 5000 connections...\nInitializing 5000 workers and preparing statements...\nE210317 10:43:36.435345 1 workload/cli/run.go:384  [-] 2  Attempt to create load generator failed. It's been more than 1h0m0s since we started trying to create the load generator so we're giving up. Last failure: failed to initialize the load generator: preparing \nE210317 10:43:36.435345 1 workload/cli/run.go:384  [-] 2 +\t\tINSERT INTO \"order\" (o_id, o_d_id, o_w_id, o_c_id, o_entry_d, o_ol_cnt, o_all_local)\nE210317 10:43:36.435345 1 workload/cli/run.go:384  [-] 2 +\t\tVALUES ($1, $2, $3, $4, $5, $6, $7): context deadline exceeded\nError: failed to initialize the load generator: preparing \n\t\tINSERT INTO \"order\" (o_id, o_d_id, o_w_id, o_c_id, o_entry_d, o_ol_cnt, o_all_local)\n\t\tVALUES ($1, $2, $3, $4, $5, $6, $7): context deadline exceeded\nError: COMMAND_PROBLEM: exit status 1\n(1) COMMAND_PROBLEM\nWraps: (2) Node 4. Command with error:\n  | ```\n  | ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}\n  | ```\nWraps: (3) exit status 1\nError types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError\n"
		Wraps: (7) attached stack trace
		  -- stack trace:
		  | main.execCmdWithBuffer
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:566
		  | main.(*cluster).RunWithBuffer
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2353
		  | main.loadTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:709
		  | main.runTPCCBench.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:783
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2666
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2785149-1615960994-40-n12cpu4-geo:4 -- ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=736 --wait=false --ramp=25m0s --duration=1h15m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *withstack.withStack (8) *errutil.withPrefix (9) *exec.ExitError

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@36dea46f8cedf42df31b57dd70db7e0f1fd7a453:

		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:785
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:768
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 7: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:849: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2788995-1616047269-42-n12cpu4-geo --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		8: skipped
		3: 6024
		1: 6071
		12: skipped
		7: dead
		6: 6752
		2: 6052
		5: 5707
		10: 5711
		9: 5747
		11: 5670
		Error: UNCLASSIFIED_PROBLEM: 7: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 7: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@ee9f47b9ec9476a693464e2dcd09a01bf9d39ad2:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2688,tpcc.go:785,tpcc.go:617,test_runner.go:768: monitor failure: unexpected node event: 3: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:785
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:768
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@3d19b2cf6b290a152b23722fc32e995eed3b437b:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2688,tpcc.go:785,tpcc.go:617,test_runner.go:768: monitor failure: unexpected node event: 11: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:785
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:768
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 11: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@893643b63ea0b1cfa4888c6b73b5c68a9c100c3a:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2688,tpcc.go:785,tpcc.go:617,test_runner.go:768: monitor failure: unexpected node event: 3: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:785
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:768
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@53bf501e233c337b9863755914d9c00010517329:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2220,tpcc.go:807,search.go:43,search.go:173,tpcc.go:803,tpcc.go:617,test_runner.go:768: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod stop teamcity-2802936-1616478847-43-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 1
		(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod stop teamcity-2802936-1616478847-43-n12cpu4-geo:1-3,5-7,9-11 returned
		  | stderr:
		  |
		  | stdout:
		  | <... some data truncated by circular buffer; go to artifacts for details ...>
		  |
		  | 4: exit status 255: 
		  | I210323 13:29:02.941341 1 (gostd) cluster_synced.go:1732  [-] 1  command failed
		Wraps: (2) exit status 1
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@9fa4b125bfb07552b43ba4fd52c9301afd7a937b:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2688,tpcc.go:785,tpcc.go:617,test_runner.go:768: monitor failure: unexpected node event: 11: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:785
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:768
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 11: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@tbg tbg assigned tbg and unassigned nvanbenschoten Mar 24, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@cbebc6e05491c6951216993ed5e12e22504624f2:

		7: 8797
		2: 10260
		5: 9119
		9: dead
		11: 8550
		10: 8460
		Error: UNCLASSIFIED_PROBLEM: 6: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) secondary error attachment
		  | 9: dead
		  | (1) attached stack trace
		  |   -- stack trace:
		  |   | main.glob..func14
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  |   | main.wrap.func1
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  |   | github.com/spf13/cobra.(*Command).execute
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  |   | github.com/spf13/cobra.(*Command).ExecuteC
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  |   | github.com/spf13/cobra.(*Command).Execute
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  |   | main.main
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  |   | runtime.main
		  |   | 	/usr/local/go/src/runtime/proc.go:204
		  |   | runtime.goexit
		  |   | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		  | Wraps: (2) 9: dead
		  | Error types: (1) *withstack.withStack (2) *errutil.leafError
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (4) 6: dead
		Error types: (1) errors.Unclassified (2) *secondary.withSecondaryError (3) *withstack.withStack (4) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 19, 2021
This patch adds historical information to the assertion against closed
timestamp regressions. We've seen that assertion fire in cockroachdb#61981.
The replica now maintains info about what command last bumped the
ClosedTimestamp.

Release note: None
@tbg tbg changed the title roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [overload] roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [overload,closed ts regressing from X to Y] Apr 20, 2021
@tbg tbg assigned andreimatei and unassigned tbg Apr 20, 2021
@tbg tbg added the GA-blocker label Apr 20, 2021
@tbg
Copy link
Member

tbg commented Apr 20, 2021

Added GA-blocker due to the closed timestamp regression in #61981 (comment)

Some more context: #62655 (comment)

andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 20, 2021
This patch adds historical information to the assertion against closed
timestamp regressions. We've seen that assertion fire in cockroachdb#61981.
The replica now maintains info about what command last bumped the
ClosedTimestamp.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 21, 2021
This patch adds historical information to the assertion against closed
timestamp regressions. We've seen that assertion fire in cockroachdb#61981.
The replica now maintains info about what command last bumped the
ClosedTimestamp.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 21, 2021
This patch adds historical information to the assertion against closed
timestamp regressions. We've seen that assertion fire in cockroachdb#61981.
The replica now maintains info about what command last bumped the
ClosedTimestamp.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 22, 2021
This patch adds historical information to the assertion against closed
timestamp regressions. We've seen that assertion fire in cockroachdb#61981.
The replica now maintains info about what command last bumped the
ClosedTimestamp.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 22, 2021
This patch adds historical information to the assertion against closed
timestamp regressions. We've seen that assertion fire in cockroachdb#61981.
The replica now maintains info about what command last bumped the
ClosedTimestamp.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 23, 2021
This patch improves the closed timestamp regression assertion we've seen
fire in cockroachdb#61981 to include a tail of the Raft log.
Hopefully we never see that assertion fire again, but still I'd like to
introduce a precedent for easily printing the log programatically.

Also, the assertion now tells people about
COCKROACH_RAFT_CLOSEDTS_ASSERTIONS_ENABLED. If the assertion fires and
crashes nodes, those nodes will continue crashing on restart as they try
to apply the same entries over and over.

Release note: None
@andreimatei
Copy link
Contributor

I've removed the GA-blocker since I couldn't repro the closed ts regression and it hasn't shown up through other channels since either. I think it's time to let go. My hope is that we've fixed the issue somehow. If not, the respective assertion will give us more info next time it happens...

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@tbg
Copy link
Member

tbg commented Apr 26, 2021

#61981 (comment) is infra fluke

02:07:24 cluster.go:1169: test status: resetting cluster
02:07:24 cluster.go:387: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod reset teamcity-2917180-1619205254-45-n12cpu4-geo
Error: Command: gcloud [compute instances reset --project cockroach-ephemeral --zone us-west1-b teamcity-2917180-1619205254-45-n12cpu4-geo-0005 teamcity-2917180-1619205254-45-n12cpu4-geo-0006 teamcity-2917180-1619205254-45-n12cpu4-geo-0007 teamcity-2917180-1619205254-45-n12cpu4-geo-0008]: exit status 1
(1) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/vm/gce.(*Provider).Reset.func1
  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/vm/gce/gcloud.go:551
  | golang.org/x/sync/errgroup.(*Group).Go.func1
  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
  | runtime.goexit
  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
Wraps: (2) Command: gcloud [compute instances reset --project cockroach-ephemeral --zone us-west1-b teamcity-2917180-1619205254-45-n12cpu4-geo-0005 teamcity-2917180-1619205254-45-n12cpu4-geo-0006 teamcity-2917180-1619205254-45-n12cpu4-geo-0007 teamcity-2917180-1619205254-45-n12cpu4-geo-0008]
  | Output: Updated [https://www.googleapis.com/compute/v1/projects/cockroach-ephemeral/zones/us-west1-b/instances/teamcity-2917180-1619205254-45-n12cpu4-geo-0005].
  | Updated [https://www.googleapis.com/compute/v1/projects/cockroach-ephemeral/zones/us-west1-b/instances/teamcity-2917180-1619205254-45-n12cpu4-geo-0006].
  | Updated [https://www.googleapis.com/compute/v1/projects/cockroach-ephemeral/zones/us-west1-b/instances/teamcity-2917180-1619205254-45-n12cpu4-geo-0007].
  | Updated [https://www.googleapis.com/compute/v1/projects/cockroach-ephemeral/zones/us-west1-b/instances/teamcity-2917180-1619205254-45-n12cpu4-geo-0008].
  | ERROR: (gcloud.compute.instances.reset) Could not fetch resource:
  |  - Internal error. Please try again or contact Google Support. (Code: '5C0AE6051792F.A254E03.EC00C5E4')

Great

tbg added a commit to tbg/cockroach that referenced this issue Apr 26, 2021
@tbg
Copy link
Member

tbg commented Apr 26, 2021

#61981 (comment) had n5 die during the initial rebalancing period. The logs look very unhappy, overloaded basically with a number of unavailable ranges due to snapshot problems. This is running with

./cockroach workload run tpcc --warehouses=3000 --workers=3000 --max-rate=490 --wait=false --ramp=15m0s --duration=45m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11}

The SHA didn't have #64060, which is why I hope that the next repro will look cleaner. I am tempted to ignore it this time for that reason.

@tbg
Copy link
Member

tbg commented Apr 26, 2021

I've removed the GA-blocker since I couldn't repro the closed ts regression and it hasn't shown up through other channels since either. I think it's time to let go. My hope is that we've fixed the issue somehow. If not, the respective assertion will give us more info next time it happens...

Will the assertion lead us to this issue? I would like to close this issue to take it off the docket since it's now unactionable.

@tbg tbg closed this as completed Apr 26, 2021
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 26, 2021
This patch improves the closed timestamp regression assertion we've seen
fire in cockroachdb#61981 to include a tail of the Raft log.
Hopefully we never see that assertion fire again, but still I'd like to
introduce a precedent for easily printing the log programatically.

Also, the assertion now tells people about
COCKROACH_RAFT_CLOSEDTS_ASSERTIONS_ENABLED. If the assertion fires and
crashes nodes, those nodes will continue crashing on restart as they try
to apply the same entries over and over.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 26, 2021
This patch improves the closed timestamp regression assertion we've seen
fire in cockroachdb#61981 to include a tail of the Raft log.
Hopefully we never see that assertion fire again, but still I'd like to
introduce a precedent for easily printing the log programatically.

Also, the assertion now tells people about
COCKROACH_RAFT_CLOSEDTS_ASSERTIONS_ENABLED. If the assertion fires and
crashes nodes, those nodes will continue crashing on restart as they try
to apply the same entries over and over.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

5 participants