Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvnemesis: TestKVNemesisMultiNode failed #115076

Closed
cockroach-teamcity opened this issue Nov 26, 2023 · 1 comment · Fixed by #115177
Closed

kv/kvnemesis: TestKVNemesisMultiNode failed #115076

cockroach-teamcity opened this issue Nov 26, 2023 · 1 comment · Fixed by #115177
Assignees
Labels
A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 26, 2023

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on release-23.1 @ 0f97132ad066321e6f65a83fe2098788162445ae:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode57094381
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 6880893433380561686
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3439017642
    kvnemesis.go:165: error applying x.AdminMerge(ctx, tk(14624680467153224630)) // kv/kvserver/replica_command.go:835: merge failed: waiting for all right-hand replicas to initialize: operation "wait for replicas init" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]: kv/kvserver/replica_command.go:835: merge failed: waiting for all right-hand replicas to initialize: operation "wait for replicas init" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]
    kvnemesis.go:165: error applying x.AdminMerge(ctx, tk(13806048622469307242)) // kv/kvserver/replica_command.go:835: merge failed: waiting for all right-hand replicas to initialize: operation "wait for replicas init" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]: kv/kvserver/replica_command.go:835: merge failed: waiting for all right-hand replicas to initialize: operation "wait for replicas init" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3439017642/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3439017642/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3439017642/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3439017642/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 2
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode57094381
--- FAIL: TestKVNemesisMultiNode (26.86s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-33872

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Nov 26, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Nov 26, 2023
@nvanbenschoten nvanbenschoten self-assigned this Nov 28, 2023
@nvanbenschoten nvanbenschoten removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Nov 28, 2023
@nvanbenschoten nvanbenschoten added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure labels Nov 28, 2023
@nvanbenschoten
Copy link
Member

I can quickly reproduce this with the following diff:

diff --git a/pkg/kv/kvnemesis/generator.go b/pkg/kv/kvnemesis/generator.go
index b16e8879bb6..5e87c759bc7 100644
--- a/pkg/kv/kvnemesis/generator.go
+++ b/pkg/kv/kvnemesis/generator.go
@@ -401,7 +401,7 @@ func newAllOperationsConfig() GeneratorConfig {
                },
                Merge: MergeConfig{
                        MergeNotSplit: 1,
-                       MergeIsSplit:  1,
+                       MergeIsSplit:  10,
                },
                ChangeReplicas: ChangeReplicasConfig{
                        AddReplica:        1,
diff --git a/pkg/kv/kvserver/replica_command.go b/pkg/kv/kvserver/replica_command.go
index 75ea8ca228c..ed7593a234f 100644
--- a/pkg/kv/kvserver/replica_command.go
+++ b/pkg/kv/kvserver/replica_command.go
@@ -874,7 +874,7 @@ func waitForReplicasInit(
        rangeID roachpb.RangeID,
        replicas []roachpb.ReplicaDescriptor,
 ) error {
-       return timeutil.RunWithTimeout(ctx, "wait for replicas init", 5*time.Second, func(ctx context.Context) error {
+       return timeutil.RunWithTimeout(ctx, "wait for replicas init", 1*time.Millisecond, func(ctx context.Context) error {
                g := ctxgroup.WithContext(ctx)
                for _, repl := range replicas {
                        repl := repl // copy for goroutine

when running:

dev test pkg/kv/kvnemesis -f=TestKVNemesisMultiNode --stress

This is a timing flake which we've seen in the past on kvnemesis. It's not concerning, but we should make it not flaky.

craig bot pushed a commit that referenced this issue Nov 28, 2023
114267: storage: move shared storage instantiation code to CCL r=RaduBerinde a=itsbilal

This change moves the shared storage factory
instantiation code to pkg/storageccl/engineccl/, as well as
rditer.IterateReplicaKeySpansShared. This change also
checks for an enterprise license before doing a fast rebalance.

Fixes #114185.

Epic: none

Release note: None

115177: kv: deflake kvnemesis "waiting for all right-hand replicas to initialize" failure r=nvanbenschoten a=nvanbenschoten

Fixes #115076.

We were already ignoring the "waiting for all left-hand replicas to initialize" error. Do the same with the "right-hand replicas" error.

Also, fix a broken regexp while here.

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in 29dacc2 Nov 28, 2023
blathers-crl bot pushed a commit that referenced this issue Nov 28, 2023
…ize" failure

Fixes #115076.

We were already ignoring the "waiting for all left-hand replicas to
initialize" error. Do the same with the "right-hand replicas" error.

Also, fix a broken regexp while here.

Release note: None
blathers-crl bot pushed a commit that referenced this issue Nov 28, 2023
…ize" failure

Fixes #115076.

We were already ignoring the "waiting for all left-hand replicas to
initialize" error. Do the same with the "right-hand replicas" error.

Also, fix a broken regexp while here.

Release note: None
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Projects
No open projects
Archived in project
2 participants