Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: c2c/shutdown/dest/coordinator failed #100907

Closed
cockroach-teamcity opened this issue Apr 7, 2023 · 2 comments · Fixed by #100916
Closed

roachtest: c2c/shutdown/dest/coordinator failed #100907

cockroach-teamcity opened this issue Apr 7, 2023 · 2 comments · Fixed by #100916
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-disaster-recovery
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 7, 2023

roachtest.c2c/shutdown/dest/coordinator failed with artifacts on master @ 8e6f530644eb35313a479af529b1f672c12dca38:

test artifacts and logs in: /artifacts/c2c/shutdown/dest/coordinator/run_1
(assertions.go:262).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:514
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:623
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:1005
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	expected job status succeeded, but got running
	            	(1) attached stack trace
	            	  -- stack trace:
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*replicationTestSpec).stopReplicationStream.func1
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:510
	            	  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
	            	  | 	github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:213
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*replicationTestSpec).stopReplicationStream
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:497
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*replicationTestSpec).main
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:623
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClusterReplicationResilience.func1.2
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:1005
	            	  | main.(*monitorImpl).Go.func1
	            	  | 	main/pkg/cmd/roachtest/monitor.go:105
	            	  | golang.org/x/sync/errgroup.(*Group).Go.func1
	            	  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            	  | runtime.goexit
	            	  | 	GOROOT/src/runtime/asm_amd64.s:1594
	            	Wraps: (2) expected job status succeeded, but got running
	            	Error types: (1) *withstack.withStack (2) *errutil.leafError
	Test:       	c2c/shutdown/dest/coordinator
(require.go:1264).NoError: FailNow called
(monitor.go:127).Wait: monitor failure: monitor task failed: context canceled while waiting for job to finish: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=8 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-26688

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Apr 7, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Apr 7, 2023
@cockroach-teamcity
Copy link
Member Author

roachtest.c2c/shutdown/dest/coordinator failed with artifacts on master @ 8e6f530644eb35313a479af529b1f672c12dca38:

	            	monitor failure: monitor task failed: t.Fatal() was called
	            	(1) attached stack trace
	            	  -- stack trace:
	            	  | main.(*monitorImpl).WaitE
	            	  | 	main/pkg/cmd/roachtest/monitor.go:115
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*replicationTestSpec).compareTenantFingerprintsAtTimestamp
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:553
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*replicationTestSpec).main
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:629
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClusterReplicationResilience.func1.2
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:1005
	            	  | main.(*monitorImpl).Go.func1
	            	  | 	main/pkg/cmd/roachtest/monitor.go:105
	            	  | golang.org/x/sync/errgroup.(*Group).Go.func1
	            	  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            	Wraps: (2) monitor failure
	            	Wraps: (3) attached stack trace
	            	  -- stack trace:
	            	  | main.(*monitorImpl).wait.func2
	            	  | 	main/pkg/cmd/roachtest/monitor.go:171
	            	Wraps: (4) monitor task failed
	            	Wraps: (5) attached stack trace
	            	  -- stack trace:
	            	  | main.(*monitorImpl).Go.func1.1
	            	  | 	main/pkg/cmd/roachtest/monitor.go:101
	            	  | runtime.gopanic
	            	  | 	GOROOT/src/runtime/panic.go:884
	            	  | main.(*testImpl).Fatalf
	            	  | 	main/pkg/cmd/roachtest/test_impl.go:298
	            	  | github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*Row).Scan
	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:218
	            	  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*replicationTestSpec).compareTenantFingerprintsAtTimestamp.func1
	            	  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:538
	            	  | main.(*monitorImpl).Go.func1
	            	  | 	main/pkg/cmd/roachtest/monitor.go:105
	            	  | [...repeated from below...]
	            	Wraps: (6) attached stack trace
	            	  -- stack trace:
	            	  | main.init
	            	  | 	main/pkg/cmd/roachtest/monitor.go:80
	            	  | runtime.doInit
	            	  | 	GOROOT/src/runtime/proc.go:6348
	            	  | runtime.main
	            	  | 	GOROOT/src/runtime/proc.go:233
	            	  | runtime.goexit
	            	  | 	GOROOT/src/runtime/asm_amd64.s:1594
	            	Wraps: (7) t.Fatal() was called
	            	Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *withstack.withStack (7) *errutil.leafError
	Test:       	c2c/shutdown/dest/coordinator
(require.go:1264).NoError: FailNow called

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=8 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@msbutler msbutler self-assigned this Apr 7, 2023
@msbutler
Copy link
Collaborator

msbutler commented Apr 7, 2023

there's a bug in the test infra that incorrectly selects a source cluster node to shutdown instead of a destination cluster node. The failure resulting in the src cluster node shutdown is the same as #100572

@msbutler msbutler removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 7, 2023
craig bot pushed a commit that referenced this issue Apr 10, 2023
100618: builtins: remove WaitPolicy error from fingerprinting builtin r=dt a=adityamaru

During development the wait policy for ExportRequests sent during fingerprinting was set to error. This meant that if the ExportRequest encountered an intent it would immediately return and error out. This is fine if we added a retry loop similar to how backup processor requeues the request to be sent at a later time by when the intent is resolved, but as is this is incorrect. We change the WaitPolicy to block so that the ExportRequest blocks until the intent is resolved. If in the future we want to make progress while another ExportRequest is stuck resolving intents we can rework this logic to look similar to our backup strategy.

Release note: None
Epic: none

100703: sql: add new persistedV22_2 views r=maryliag a=maryliag

Note to reviewers: this is a forward-porting from #100673

Fixes #100501

Adds {statement|transaction}_statistics_persisted_v22_2 like they were added on #96454. Check the version of the cluster before deciding which view to use. This is required for mixed version cluster with 22.2 and 23.1 versions.

Release note: None

100734: backupccl: gate writing slim manifests in backup via cluster setting r=rhu713 a=rhu713

Feature gate the writing of slim manifests at the end of backup jobs with
the cluster setting `backup.write_metadata_with_external_ssts.enabled`.

Release note: None

100916: roachtest: fix c2c/shutdown/dest bug r=dt,adityamaru a=msbutler

Previously, the c2c/shutdown/dest tests were incorrectly shutting down nodes on the source cluster. This patch fixes this.

Fixes #100907

Release note: None

101057: sqlproxyccl: skip TestDirectoryConnect r=adityamaru a=knz

Informs #76839.

Release note: None
Epic: None

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Rui Hu <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig craig bot closed this as completed in 6dfd113 Apr 10, 2023
msbutler added a commit to msbutler/cockroach that referenced this issue Apr 18, 2023
Previously, the c2c/shutdown/dest tests were incorrectly shutting down nodes on
the source cluster. This patch fixes this.

Fixes cockroachdb#100907

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants