Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: kv/quiescence/nodes=3 failed #104940

Closed
cockroach-teamcity opened this issue Jun 15, 2023 · 5 comments
Closed

roachtest: kv/quiescence/nodes=3 failed #104940

cockroach-teamcity opened this issue Jun 15, 2023 · 5 comments
Assignees
Labels
A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 15, 2023

roachtest.kv/quiescence/nodes=3 failed with artifacts on release-23.1 @ d76049d66d6e3a2cd398d0347a63dd29cc9b928a:

test artifacts and logs in: /artifacts/kv/quiescence/nodes=3/run_1
(test_runner.go:1150).func1: 1 dead node(s) detected

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-28784

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Jun 15, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Jun 15, 2023
@kvoli
Copy link
Collaborator

kvoli commented Jun 15, 2023

Test succeeds but then fails the dead node detector:

06:29:07 kv.go:461: QPS went from 2968.85 to 2921 with one node down
06:29:07 test_runner.go:1066: tearing down after success; see teardown.log

The test explicitly kills a node but the teardown process doesn't seem to care and considers this a failure.

Same situation as #102054 which is supposed to be resolved by #102162. We still are hitting this though.

Removing release blocker for now.

@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 15, 2023
@kvoli kvoli self-assigned this Jun 15, 2023
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jun 15, 2023

I don't think #102162 was backported to 23.1 -- and if it were, maybe these tests weren't opted out. That would also explain the failover disk stall test failures on 23.1: #104960 and #104961.

@kvoli kvoli added T-testeng TestEng Team and removed T-kv KV Team labels Jun 15, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 15, 2023

cc @cockroachdb/test-eng

@kvoli kvoli assigned smg260 and unassigned kvoli Jun 15, 2023
@kvoli
Copy link
Collaborator

kvoli commented Jun 15, 2023

Yup that looks like it, I don't see the skip teardown option in 23.1

Cluster: r.MakeClusterSpec(4),
Leases: registry.MetamorphicLeases,
Run: func(ctx context.Context, t test.Test, c cluster.Cluster) {

Where I do see it on master

SkipPostValidations: registry.PostValidationNoDeadNodes,

Re-assigning to test-eng @smg260 - I think the commit just needs a backport to 23.1, however I'm unsure why this only started failing again now.

@smg260
Copy link
Contributor

smg260 commented Jun 15, 2023

This failed today because, several roachtest changes were backported via #104882 yesterday, including one which fixed the dead node detector. We missed the subsequent PR (mentioned above) which allowed for a test to skip the "no dead node" assertion.

#104977 should resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team
Projects
None yet
Development

No branches or pull requests

4 participants