Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: decommission/randomized failed #61692

Closed
cockroach-teamcity opened this issue Mar 9, 2021 · 3 comments
Closed

roachtest: decommission/randomized failed #61692

cockroach-teamcity opened this issue Mar 9, 2021 · 3 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).decommission/randomized failed on release-20.2@4da6dc378d563472d59166167ce760756e76f893:

The test failed on branch=release-20.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/decommission/randomized/run_1
	test_runner.go:814: test timed out (10m0s)

More

Artifacts: /decommission/randomized

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-release-20.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 9, 2021
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Mar 9, 2021
@knz
Copy link
Contributor

knz commented Mar 16, 2021

This seems like a legitimate failure, a decommission operation did not complete and caused the test to fail with a timeout.

I am a bit confused by the test artifacts: the test is meaning to decommission two nodes n4 and n5 however I only see log directories named "5" and "6" (unsure if there's a relationship to the node IDs...)

cc @irfansharif @erikgrinaker for triage

@erikgrinaker
Copy link
Contributor

This is probably #56718, which was fixed by #61356 but not backported to 20.2. I'll look into it.

@tbg
Copy link
Member

tbg commented Mar 19, 2021

Looks like n2 spontaneously combusted and so the decommission hung because replicas didn't have anywhere to move to

teardown: 09:20:25 cluster.go:1626: ./cockroach debug zip failed: 2: pghosts: GetInternalIP: failed to execute hostname on teamcity-2757410-1615273709-37-n6cpu4:3:: exit status 255

@tbg tbg closed this as completed Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

No branches or pull requests

4 participants