Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/tpcc/warehouses=4000/geo failed [raft sideload oom] #76824

Closed
cockroach-teamcity opened this issue Feb 20, 2022 · 9 comments
Closed
Labels
branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) X-nostale Marks an issue/pr that should be ignored by the stale bot

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 20, 2022

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-21.2 @ 0bb1218f1c16dbebda16ace42d2d682b22aa3c96:

		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:134
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:159
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 7: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1296,context.go:89,cluster.go:1284,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-4417779-1645341221-47-n8cpu16-geo --oneshot --ignore-empty-nodes: exit status 1 8: 10189
		1: 10322
		2: 10218
		4: 10031
		3: 10271
		7: dead (exit status 137)
		5: 10683
		6: 9880
		Error: UNCLASSIFIED_PROBLEM: 7: dead (exit status 137)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/roachprod.Monitor
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:596
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:569
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:255
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (3) 7: dead (exit status 137)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: roachtest README

Same failure on other branches

/cc @cockroachdb/bulk-io

This test on roachdash | Improve this report!

Jira issue: CRDB-13294

Epic CRDB-15069

@cockroach-teamcity cockroach-teamcity added branch-release-21.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 20, 2022
@adityamaru
Copy link
Contributor

node 7 was OOM killed

@adityamaru
Copy link
Contributor

adityamaru commented Feb 21, 2022

Latest profile from node 7. The maybeInlineSideloadedRaftCommand allocation seems suspicious.

Screen Shot 2022-02-21 at 10 06 35 AM

Screen Shot 2022-02-21 at 10 06 16 AM

@msbutler
Copy link
Collaborator

msbutler commented Feb 23, 2022

@adityamaru I'm removing the release blocker given this roachtest has passed the past few days. I'll continue looking through logs to determine if we should punt this to another team.

@msbutler msbutler removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Feb 23, 2022
@msbutler msbutler removed their assignment Apr 1, 2022
@msbutler
Copy link
Collaborator

msbutler commented Apr 1, 2022

aditya's heap profiles above indicate that this is a duplicate of #70307 on release 21.2. Reassigning to kv to track.

@msbutler msbutler added the T-kv KV Team label Apr 1, 2022
@github-actions
Copy link

github-actions bot commented May 2, 2022

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

@erikgrinaker
Copy link
Contributor

Related to ongoing work in #73376.

@erikgrinaker erikgrinaker added X-nostale Marks an issue/pr that should be ignored by the stale bot T-kv-replication and removed T-disaster-recovery T-kv KV Team labels May 5, 2022
@erikgrinaker erikgrinaker changed the title roachtest: import/tpcc/warehouses=4000/geo failed roachtest: import/tpcc/warehouses=4000/geo failed [raft sideload oom] May 5, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-21.2 @ c2a7c3beee18554abd09a06d5c647b71da780827:

		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:134
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:159
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1571
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1296,context.go:91,cluster.go:1284,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-6019678-1660024331-48-n8cpu16-geo --oneshot --ignore-empty-nodes: exit status 1 7: 11190
		8: 11065
		2: 10923
		1: 11040
		4: 10646
		3: dead (exit status 137)
		6: 10445
		5: 10822
		Error: UNCLASSIFIED_PROBLEM: 3: dead (exit status 137)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/roachprod.Monitor
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:596
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:569
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:250
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1571
		Wraps: (3) 3: dead (exit status 137)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: roachtest README

Same failure on other branches

/cc @cockroachdb/bulk-io

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-21.2 @ c27f55e102a8e439b9f13cc847fff039a7eda55a:

		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:134
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerImportTPCC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/import.go:159
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1571
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 6: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1296,context.go:91,cluster.go:1284,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-6415653-1662789247-44-n8cpu16-geo --oneshot --ignore-empty-nodes: exit status 1 7: 11417
		8: 11210
		1: 11662
		2: 11336
		4: 11595
		3: 11090
		5: 10685
		6: dead (exit status 137)
		Error: UNCLASSIFIED_PROBLEM: 6: dead (exit status 137)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/roachprod.Monitor
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:596
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:569
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:250
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1571
		Wraps: (3) 6: dead (exit status 137)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: roachtest README

Same failure on other branches

/cc @cockroachdb/bulk-io

This test on roachdash | Improve this report!

@erikgrinaker
Copy link
Contributor

Mitigated by #88990.

@exalate-issue-sync exalate-issue-sync bot added the branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) X-nostale Marks an issue/pr that should be ignored by the stale bot
Projects
None yet
Development

No branches or pull requests

6 participants