Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/pubsub-sink/assume-role failed #88936

Closed
cockroach-teamcity opened this issue Sep 28, 2022 · 6 comments
Closed

roachtest: cdc/pubsub-sink/assume-role failed #88936

cockroach-teamcity opened this issue Sep 28, 2022 · 6 comments
Assignees
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. T-cdc
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 28, 2022

roachtest.cdc/pubsub-sink/assume-role failed with artifacts on release-22.2 @ 08974d4b0433a14aa83251a44df9659eb8e3ae65:

test artifacts and logs in: /artifacts/cdc/pubsub-sink/assume-role/run_1
	monitor.go:127,cdc.go:300,cdc.go:820,test_runner.go:908: monitor failure: monitor task failed: dial tcp 34.75.140.93:26257: connect: connection refused
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.cdcBasicTest
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:300
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerCDC.func9
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:820
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1594
		Wraps: (4) monitor task failed
		Wraps: (5) dial tcp 34.75.140.93:26257
		Wraps: (6) connect
		Wraps: (7) connection refused
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *net.OpError (6) *os.SyscallError (7) syscall.Errno

	test_runner.go:1039,test_runner.go:938: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-20046

Epic CRDB-11732

@cockroach-teamcity cockroach-teamcity added branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 28, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Sep 28, 2022
@blathers-crl blathers-crl bot added the T-cdc label Sep 28, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/pubsub-sink/assume-role failed with artifacts on release-22.2 @ 860584a59dee73d7a66ce882c668cef6eb2556f7:

test artifacts and logs in: /artifacts/cdc/pubsub-sink/assume-role/run_1
	monitor.go:127,cdc.go:300,cdc.go:820,test_runner.go:908: monitor failure: monitor task failed: dial tcp 35.231.70.69:26257: connect: connection refused
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.cdcBasicTest
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:300
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerCDC.func9
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:820
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1594
		Wraps: (4) monitor task failed
		Wraps: (5) dial tcp 35.231.70.69:26257
		Wraps: (6) connect
		Wraps: (7) connection refused
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *net.OpError (6) *os.SyscallError (7) syscall.Errno

	test_runner.go:1039,test_runner.go:938: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@miretskiy miretskiy removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Oct 3, 2022
@miretskiy
Copy link
Contributor

There is something wonky going on here; the logs indicate that the test started at 07:34 but nodes received SIGQUIT at 17:34 -- 12 hours later... Feels like some issue with perhaps cdc test or roachtest;

@renatolabs -- do you see anything interesting in the logs (that I don't see)? Or perhaps you know of some existing issue?

@renatolabs
Copy link
Contributor

renatolabs commented Oct 3, 2022

@miretskiy the nodes are killed because the test exceeded the 10-hour maximum duration that a roachtest can take.

Interestingly, I was comparing the logs with the source for this test, and we never see the message printed when the changefeed is created, meaning the changefeed was never created successfully. In addition, by looking at the logs, we see that:

  1. the tpcc workload finishes without problems
  2. cluster settings set by the test are properly updated

Meaning that the test hangs while running the statement:

fmt.Sprintf("CREATE CHANGEFEED FOR %s INTO $1", cfc.targets)

Code: https://github.com/cockroachdb/cockroach/blob/release-22.2/pkg/cmd/roachtest/tests/cdc.go#L1815-L1834.

I wonder if #88289 did not fully fix the issue and there is still a possibility of deadlock?

@miretskiy
Copy link
Contributor

@HonoreDB -- mind taking a look at this re possibility that #88289 did not fully fix the issue?

@miretskiy
Copy link
Contributor

@HonoreDB Any updates on this? No failures in the past 5 days...

@HonoreDB
Copy link
Contributor

Yup, this was fixed by cdcb1f18bbf865a6e749742e2cd25f1dbe2410fa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. T-cdc
Projects
None yet
Development

No branches or pull requests

5 participants