Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/tpcc/warehouses=4000/geo failed [raft sideloading oom] #70307

Closed
cockroach-teamcity opened this issue Sep 16, 2021 · 14 comments
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. S-1 High impact: many users impacted, serious risk of high unavailability or data loss T-kv KV Team X-stale

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 16, 2021

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 189259e803eca715307bfe0545c84189486a36c4:


	monitor.go:128,import.go:134,import.go:159,test_runner.go:777: monitor failure: unexpected node event: 3: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:134
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:159
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1253,context.go:89,cluster.go:1241,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3452146-1631773069-49-n8cpu16-geo --oneshot --ignore-empty-nodes: exit status 1 7: 10457
		8: 10474
		1: 10473
		2: 10546
		4: 11519
		3: dead (exit status 137)
		6: 10030
		5: 9964
		Error: UNCLASSIFIED_PROBLEM: 3: dead (exit status 137)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1173
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2107
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:225
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (3) 3: dead (exit status 137)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: roachtest README

/cc @cockroachdb/bulk-io

This test on roachdash | Improve this report!

Jira issue: CRDB-10024

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 16, 2021
@adityamaru
Copy link
Contributor

adityamaru commented Sep 16, 2021

[ 1089.797881] cockroach invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

Node 3 was OOM killed.

@adityamaru
Copy link
Contributor

adityamaru commented Sep 16, 2021

Screen Shot 2021-09-16 at 1 45 22 PM

This seems to be growing across all the heap profiles in the debug zip.

@adityamaru
Copy link
Contributor

I ran a couple of iterations of the roachtest and tracked the heap profiles, and did not see any concerning spikes in usage during the imports.

@dt dt removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 20, 2021
@adityamaru
Copy link
Contributor

I'm handing this over to KV to close out incase they want to look at something I've missed. I haven't been able to reproduce, and the test seems pretty green otherwise so it could've just been a bad run.

@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 40f11fead0a0453969634f8ddb0502c1f78b2806:

		  | main.(*clusterImpl).Run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1964
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:140
		  | main.(*monitorImpl).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:106
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_231311.829968446_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3804493-1637959164-44-n8cpu16-geo:1 -- ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081' returned
		  | stderr:
		  | I211126 23:13:14.184316 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 9 tables
		  | I211126 23:13:18.065889 24 ccl/workloadccl/fixture.go:502  [-] 2  imported 213 KiB in warehouse table (4000 rows, 0 index entries, took 1.899747266s, 0.11 MiB/s)
		  | I211126 23:13:18.345490 25 ccl/workloadccl/fixture.go:502  [-] 3  imported 3.9 MiB in district table (40000 rows, 0 index entries, took 2.179251894s, 1.81 MiB/s)
		  | I211126 23:13:19.816254 30 ccl/workloadccl/fixture.go:502  [-] 4  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 3.649698792s, 2.13 MiB/s)
		  | I211126 23:14:22.384915 29 ccl/workloadccl/fixture.go:502  [-] 5  imported 512 MiB in new_order table (36000000 rows, 0 index entries, took 1m6.218485338s, 7.73 MiB/s)
		  | I211126 23:18:45.999695 27 ccl/workloadccl/fixture.go:502  [-] 6  imported 8.6 GiB in history table (120000000 rows, 0 index entries, took 5m29.833463437s, 26.72 MiB/s)
		  | I211126 23:20:52.330963 28 ccl/workloadccl/fixture.go:502  [-] 7  imported 6.3 GiB in order table (120000000 rows, 120000000 index entries, took 7m36.164548176s, 14.13 MiB/s)
		  | I211126 23:40:55.527518 26 ccl/workloadccl/fixture.go:502  [-] 8  imported 69 GiB in customer table (120000000 rows, 120000000 index entries, took 27m39.361272687s, 42.60 MiB/s)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:128,import.go:154,import.go:179,test_runner.go:779: monitor failure: unexpected node event: 8: dead (exit status 1)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:154
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:179
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 8: dead (exit status 1)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1343,context.go:91,cluster.go:1333,test_runner.go:867: dead node detection: 8: dead (exit status 1)
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ b450fea83a7db1e06403b2563c13f38c9284b932:

		  | main.(*clusterImpl).Run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1964
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:140
		  | main.(*monitorImpl).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:106
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_122709.828057920_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3807010-1638000226-42-n8cpu16-geo:1 -- ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081' returned
		  | stderr:
		  | I211127 12:27:12.293967 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 9 tables
		  | I211127 12:27:19.118142 35 ccl/workloadccl/fixture.go:502  [-] 2  imported 213 KiB in warehouse table (4000 rows, 0 index entries, took 3.620870759s, 0.06 MiB/s)
		  | I211127 12:27:19.193341 41 ccl/workloadccl/fixture.go:502  [-] 3  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 3.695676355s, 2.10 MiB/s)
		  | I211127 12:27:19.355141 36 ccl/workloadccl/fixture.go:502  [-] 4  imported 3.9 MiB in district table (40000 rows, 0 index entries, took 3.857797347s, 1.02 MiB/s)
		  | I211127 12:28:03.242859 40 ccl/workloadccl/fixture.go:502  [-] 5  imported 512 MiB in new_order table (36000000 rows, 0 index entries, took 47.745271297s, 10.72 MiB/s)
		  | I211127 12:35:29.172819 39 ccl/workloadccl/fixture.go:502  [-] 6  imported 6.3 GiB in order table (120000000 rows, 120000000 index entries, took 8m13.675248926s, 13.05 MiB/s)
		  | I211127 12:38:20.982594 38 ccl/workloadccl/fixture.go:502  [-] 7  imported 8.6 GiB in history table (120000000 rows, 0 index entries, took 11m5.485100375s, 13.24 MiB/s)
		  | I211127 13:06:22.477068 37 ccl/workloadccl/fixture.go:502  [-] 8  imported 69 GiB in customer table (120000000 rows, 120000000 index entries, took 39m6.979632045s, 30.12 MiB/s)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:128,import.go:154,import.go:179,test_runner.go:779: monitor failure: unexpected node event: 4: dead (exit status 1)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:154
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:179
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 4: dead (exit status 1)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1343,context.go:91,cluster.go:1333,test_runner.go:867: dead node detection: 4: dead (exit status 1)
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 3b30a0e12f9a14b08ee8ad55b50299aca50c67a2:

		  | main.(*clusterImpl).Run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1964
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:140
		  | main.(*monitorImpl).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:106
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_102427.388233965_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3807326-1638083317-45-n8cpu16-geo:1 -- ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081' returned
		  | stderr:
		  | I211128 10:24:29.603459 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 9 tables
		  | I211128 10:24:31.407826 114 ccl/workloadccl/fixture.go:502  [-] 2  imported 213 KiB in warehouse table (4000 rows, 0 index entries, took 1.558061342s, 0.13 MiB/s)
		  | I211128 10:24:32.417979 115 ccl/workloadccl/fixture.go:502  [-] 3  imported 3.9 MiB in district table (40000 rows, 0 index entries, took 2.56814943s, 1.53 MiB/s)
		  | I211128 10:24:37.900683 120 ccl/workloadccl/fixture.go:502  [-] 4  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 8.050522239s, 0.97 MiB/s)
		  | I211128 10:25:14.965016 119 ccl/workloadccl/fixture.go:502  [-] 5  imported 512 MiB in new_order table (36000000 rows, 0 index entries, took 45.114990498s, 11.35 MiB/s)
		  | I211128 10:30:43.269584 118 ccl/workloadccl/fixture.go:502  [-] 6  imported 6.3 GiB in order table (120000000 rows, 120000000 index entries, took 6m13.419565447s, 17.26 MiB/s)
		  | I211128 10:31:58.474313 117 ccl/workloadccl/fixture.go:502  [-] 7  imported 8.6 GiB in history table (120000000 rows, 0 index entries, took 7m28.624461099s, 19.64 MiB/s)
		  | I211128 10:53:23.602032 116 ccl/workloadccl/fixture.go:502  [-] 8  imported 69 GiB in customer table (120000000 rows, 120000000 index entries, took 28m53.752115004s, 40.77 MiB/s)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:128,import.go:154,import.go:179,test_runner.go:779: monitor failure: unexpected node event: 2: dead (exit status 1)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:154
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:179
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 2: dead (exit status 1)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1343,context.go:91,cluster.go:1333,test_runner.go:867: dead node detection: 2: dead (exit status 1)
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 2c014c47c1a242f504f6d595bfd79c0edc20b90a:

		  | main.(*clusterImpl).Run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1964
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:140
		  | main.(*monitorImpl).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:106
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_094742.601678535_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3810278-1638169717-46-n8cpu16-geo:1 -- ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081' returned
		  | stderr:
		  | I211129 09:47:44.742670 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 9 tables
		  | I211129 09:47:47.341678 82 ccl/workloadccl/fixture.go:502  [-] 2  imported 213 KiB in warehouse table (4000 rows, 0 index entries, took 2.262144655s, 0.09 MiB/s)
		  | I211129 09:47:49.098800 83 ccl/workloadccl/fixture.go:502  [-] 3  imported 3.9 MiB in district table (40000 rows, 0 index entries, took 4.019169711s, 0.98 MiB/s)
		  | I211129 09:47:53.294153 88 ccl/workloadccl/fixture.go:502  [-] 4  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 8.21394997s, 0.95 MiB/s)
		  | I211129 09:49:03.611118 87 ccl/workloadccl/fixture.go:502  [-] 5  imported 512 MiB in new_order table (36000000 rows, 0 index entries, took 1m18.531125758s, 6.52 MiB/s)
		  | I211129 09:53:00.513080 85 ccl/workloadccl/fixture.go:502  [-] 6  imported 8.6 GiB in history table (120000000 rows, 0 index entries, took 5m15.433185765s, 27.94 MiB/s)
		  | I211129 09:53:31.419518 86 ccl/workloadccl/fixture.go:502  [-] 7  imported 6.3 GiB in order table (120000000 rows, 120000000 index entries, took 5m46.339634176s, 18.60 MiB/s)
		  | I211129 10:10:23.285752 84 ccl/workloadccl/fixture.go:502  [-] 8  imported 69 GiB in customer table (120000000 rows, 120000000 index entries, took 22m38.205996984s, 52.04 MiB/s)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:128,import.go:154,import.go:179,test_runner.go:779: monitor failure: unexpected node event: 4: dead (exit status 1)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:154
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:179
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 4: dead (exit status 1)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1343,context.go:91,cluster.go:1333,test_runner.go:867: dead node detection: 4: dead (exit status 1)
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 506d129f5f187134c35e2f71860490e044fde989:

		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:106
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_110241.198575948_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3844585-1638602423-42-n8cpu16-geo:1 -- ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081' returned
		  | stderr:
		  | I211204 11:02:43.394995 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 9 tables
		  | I211204 11:02:44.646735 56 ccl/workloadccl/fixture.go:502  [-] 2  imported 213 KiB in warehouse table (4000 rows, 0 index entries, took 1.160933222s, 0.18 MiB/s)
		  | I211204 11:02:45.860676 57 ccl/workloadccl/fixture.go:502  [-] 3  imported 3.9 MiB in district table (40000 rows, 0 index entries, took 2.374799847s, 1.66 MiB/s)
		  | I211204 11:02:49.775880 62 ccl/workloadccl/fixture.go:502  [-] 4  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 6.28923063s, 1.24 MiB/s)
		  | I211204 11:03:21.434103 61 ccl/workloadccl/fixture.go:502  [-] 5  imported 512 MiB in new_order table (36000000 rows, 0 index entries, took 37.947544913s, 13.49 MiB/s)
		  | I211204 11:24:55.766643 60 ccl/workloadccl/fixture.go:502  [-] 6  imported 6.3 GiB in order table (120000000 rows, 120000000 index entries, took 22m12.280249507s, 4.84 MiB/s)
		  | I211204 11:24:57.114632 59 ccl/workloadccl/fixture.go:502  [-] 7  imported 8.6 GiB in history table (120000000 rows, 0 index entries, took 22m13.628380698s, 6.61 MiB/s)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:128,import.go:154,import.go:179,test_runner.go:779: monitor failure: monitor task failed: read tcp 172.17.0.3:33990 -> 34.91.2.172:26257: read: connection reset by peer
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:154
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:179
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor task failed
		Wraps: (5) read tcp 172.17.0.3:33990 -> 34.91.2.172:26257
		Wraps: (6) read
		Wraps: (7) connection reset by peer
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *net.OpError (6) *os.SyscallError (7) syscall.Errno

	cluster.go:1339,context.go:91,cluster.go:1329,test_runner.go:867: dead node detection: 3: dead (exit status 8)
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@tbg
Copy link
Member

tbg commented Dec 15, 2021

Just confirming that this is still the same problem (inuse_space):
image

https://share.polarsignals.com/122ec80/

Still need to mount a proper investigation & mitigation here. cc @lunevalex

@tbg tbg changed the title roachtest: import/tpcc/warehouses=4000/geo failed roachtest: import/tpcc/warehouses=4000/geo failed [raft sideloading oom] Dec 15, 2021
@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 65c6377f4f486d537ef7dfc59f8f9c78f1a6018a:

		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_103733.250287176_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081' returned
		  | stderr:
		  | I220103 10:37:34.834496 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 9 tables
		  | I220103 10:37:37.344935 101 ccl/workloadccl/fixture.go:502  [-] 2  imported 213 KiB in warehouse table (4000 rows, 0 index entries, took 2.44111899s, 0.09 MiB/s)
		  | I220103 10:37:38.051259 102 ccl/workloadccl/fixture.go:502  [-] 3  imported 3.9 MiB in district table (40000 rows, 0 index entries, took 3.147412029s, 1.25 MiB/s)
		  | I220103 10:37:40.203729 107 ccl/workloadccl/fixture.go:502  [-] 4  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 5.299431935s, 1.47 MiB/s)
		  | I220103 10:38:18.042929 106 ccl/workloadccl/fixture.go:502  [-] 5  imported 512 MiB in new_order table (36000000 rows, 0 index entries, took 43.138654739s, 11.87 MiB/s)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081'
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,import.go:156,import.go:181,test_runner.go:780: monitor failure: monitor task failed: read tcp 172.17.0.3:43456 -> 34.127.76.108:26257: read: connection reset by peer
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:156
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:181
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor task failed
		Wraps: (5) read tcp 172.17.0.3:43456 -> 34.127.76.108:26257
		Wraps: (6) read
		Wraps: (7) connection reset by peer
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *net.OpError (6) *os.SyscallError (7) syscall.Errno
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@tbg tbg added the S-1 High impact: many users impacted, serious risk of high unavailability or data loss label Feb 1, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 80f1c2ce09389f1d7e97376964d3f2a922405b1b:

		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_101027.584125819_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081' returned
		  | stderr:
		  | I220212 10:10:30.031777 1 ccl/workloadccl/fixture.go:318  [-] 1  starting import of 9 tables
		  | I220212 10:10:38.762618 82 ccl/workloadccl/fixture.go:474  [-] 2  imported 213 KiB in warehouse table (4000 rows, 0 index entries, took 2.97674353s, 0.07 MiB/s)
		  | I220212 10:10:40.151884 83 ccl/workloadccl/fixture.go:474  [-] 3  imported 3.9 MiB in district table (40000 rows, 0 index entries, took 4.365966401s, 0.90 MiB/s)
		  | I220212 10:10:41.401860 88 ccl/workloadccl/fixture.go:474  [-] 4  imported 7.9 MiB in item table (100000 rows, 0 index entries, took 5.615765641s, 1.40 MiB/s)
		  | I220212 10:11:22.107560 87 ccl/workloadccl/fixture.go:474  [-] 5  imported 546 MiB in new_order table (36000000 rows, 0 index entries, took 46.321491383s, 11.79 MiB/s)
		  | I220212 10:41:35.324266 85 ccl/workloadccl/fixture.go:474  [-] 6  imported 4.3 GiB in history table (60459000 rows, 0 index entries, took 30m59.53829347s, 2.39 MiB/s)
		  | I220212 10:49:20.274384 86 ccl/workloadccl/fixture.go:474  [-] 7  imported 4.1 GiB in order table (75000000 rows, 75000000 index entries, took 38m44.488363997s, 1.80 MiB/s)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach workload fixtures import tpcc --warehouses=4000 --csv-server='http://localhost:8081'
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,import.go:156,import.go:181,test_runner.go:779: monitor failure: monitor task failed: read tcp 172.17.0.3:52014 -> 35.227.187.3:26257: read: connection reset by peer
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:156
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:181
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor task failed
		Wraps: (5) read tcp 172.17.0.3:52014 -> 35.227.187.3:26257
		Wraps: (6) read
		Wraps: (7) connection reset by peer
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *net.OpError (6) *os.SyscallError (7) syscall.Errno
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on master @ 29716850b181718594663889ddb5f479fef7a305:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/import/tpcc/warehouses=4000/geo/run_1
	cluster.go:1868,import.go:120,import.go:181,test_runner.go:875: one or more parallel execution failure
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).ParallelE
		  | 	github.com/cockroachdb/cockroach/pkg/roachprod/install/cluster_synced.go:2042
		  | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).Parallel
		  | 	github.com/cockroachdb/cockroach/pkg/roachprod/install/cluster_synced.go:1923
		  | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).Start
		  | 	github.com/cockroachdb/cockroach/pkg/roachprod/install/cockroach.go:167
		  | github.com/cockroachdb/cockroach/pkg/roachprod.Start
		  | 	github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:660
		  | main.(*clusterImpl).StartE
		  | 	main/pkg/cmd/roachtest/cluster.go:1826
		  | main.(*clusterImpl).Start
		  | 	main/pkg/cmd/roachtest/cluster.go:1867
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:120
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:181
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:875
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) one or more parallel execution failure
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@github-actions
Copy link

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. S-1 High impact: many users impacted, serious risk of high unavailability or data loss T-kv KV Team X-stale
Projects
None yet
Development

No branches or pull requests

5 participants