Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestDecommission failed #106096

Closed
cockroach-teamcity opened this issue Jul 4, 2023 · 2 comments · Fixed by #107110
Closed

kv/kvserver: TestDecommission failed #106096

cockroach-teamcity opened this issue Jul 4, 2023 · 2 comments · Fixed by #107110
Assignees
Labels
A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 4, 2023

kv/kvserver.TestDecommission failed with artifacts on release-23.1 @ a26ee7efd223c92cda531152837302f03a14f461:

=== RUN   TestDecommission
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestDecommission3124719001
    test_log_scope.go:79: use -show-logs to present logs inline
    client_raft_test.go:3305: condition failed to evaluate within 45s: still a replica on s1: r64:/{Table/Max-Max} [(n1,s1):1, (n4,s4):2, (n5,s5):3, next=4, gen=5, sticky=9223372036.854775807,2147483647]
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestDecommission3124719001
--- FAIL: TestDecommission (92.87s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/replication

This test on roachdash | Improve this report!

Jira issue: CRDB-29396

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv-replication labels Jul 4, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Jul 4, 2023
@erikgrinaker erikgrinaker added T-kv KV Team and removed T-kv-replication labels Jul 4, 2023
@tbg
Copy link
Member

tbg commented Jul 4, 2023

Deadlock build, so maybe just needs to be not stressed under deadlock.

tbg added a commit to tbg/cockroach that referenced this issue Jul 4, 2023
`client_raft_test.go` is owned by the Replication team.

Noticed in cockroachdb#106096.

Epic: none
Release note: None
@erikgrinaker
Copy link
Contributor

It's already skipped under race, makes sense to skip under deadlock too.

craig bot pushed a commit that referenced this issue Jul 4, 2023
106105: kvserver: mode TestDecommission to kv-owned file r=erikgrinaker a=tbg

`client_raft_test.go` is owned by the Replication team.

Noticed in #106096.

Epic: none
Release note: None


Co-authored-by: Tobias Grieger <[email protected]>
@andrewbaptist andrewbaptist added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure labels Jul 19, 2023
craig bot pushed a commit that referenced this issue Jul 19, 2023
107110: kvserver: avoid running intensive decommission test under deadlock r=AlexTalks a=AlexTalks

The kvserver test `TestDecommission`, which runs a 5-node cluster and decommissions 4 of those 5 nodes, has trouble completing fast enough when under a race or deadlock configuration. While race configurations were already skipped, this modifies the test to be skipped under deadlock configurations as well.

Fixes: #106096

Release note: None

107111: compose: start `docker-compose` with a non-empty `PATH` r=rail a=rickystewart

`docker-compose` invokes `docker`, but obviously this will fail if there is nothing in the `PATH`.

Epic: none
Release note: None

107129: dev: reject builds when CRL JS dependencies are 'pnpm link'ed r=rickystewart a=sjbarag

When working with first-party JS dependencies that aren't in this monorepo, the idiomatic development workflow uses pnpm link [1] to link multiple JS packages together. Specifically, running `pnpm link /path/to/github.com/cockroachdb/ui/packages/foo` from within pkg/ui/workspaces/* creates a symbolic link at
node_modules/`@cockroachlabs/foo` that points to
../../(…)/ui/packages/foo. This works quite smoothly for local development, as changes in the 'foo' package are automatically seen by a `pnpm webpack --watch` running in CRDB. The two packages act like they're maintained in the same repo, while allowing independent version history, CI processes, etc.

Unfortunately, rules_js currently offers no way to link outside of the Bazel workspace. Such a symlink is either ignored by rules_js (since it doesn't appear in pnpm-lock.yaml) or is rejected if it's manually added to the lockfile [2]. Allowing Bazel-based builds of CockroachDB when 'pnpm link'ed packages are present introduces dependency skew between Bazel and non-Bazel builds. To allow `pnpm link` to be used safely, pre-emptively reject builds of 'cockroach' and 'cockroach-oss' that are run through the 'dev' helper when linked `@cockroachlabs/` packages are detected.

[1] https://pnpm.io/cli/link
[2] aspect-build/rules_js#1165

Release note: None
Epic: none

-----

Example output: 
<img width="998" alt="image" src="https://github.com/cockroachdb/cockroach/assets/665775/3fd43abe-f5c2-4ddd-bc60-16a73db12836">

Total duration is 16ms on my machine since we're checking so few files. We can drop these into a goroutine per first-party JS if we want, but this is certainly fast enough to be conversational.

107149: opt: make functional dependency calculation deterministic r=DrewKimball a=DrewKimball

We recently added additional logic for inferring functional dependencies for join expressions. However, this logic iterates through a map, which leads to nondeterminism in which order functional dependencies are added to the FD set. Functional dependency calculation is best-effort, so this can lead to a different resulting FD set, which causes flaky tests. This patch makes the calculation deterministic by iterating instead through a `intsets.Fast` set.

Fixes #107148
Fixes #107162

Release note: None

107181: dev: when cross-building, use `-c opt` r=rail a=rickystewart

This enables optimizations which you probably want for a cross-build.

Epic: CRDB-17171
Release note: None

107183: acceptance: add log dir as a writable path r=rail a=rickystewart

Otherwise you get a sandbox error.

Epic: CRDB-17171
Release note: None

Co-authored-by: Alex Sarkesian <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Sean Barag <[email protected]>
Co-authored-by: Drew Kimball <[email protected]>
@craig craig bot closed this as completed in f3b8cf6 Jul 19, 2023
AlexTalks added a commit that referenced this issue Jul 20, 2023
The kvserver test `TestDecommission`, which runs a 5-node cluster and
decommissions 4 of those 5 nodes, has trouble completing fast enough
when under a race or deadlock configuration. While race configurations
were already skipped, this modifies the test to be skipped under
deadlock configurations as well.

Fixes: #106096

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants