Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvnemesis: TestKVNemesisMultiNode failed #104476

Closed
cockroach-teamcity opened this issue Jun 7, 2023 · 8 comments
Closed

kv/kvnemesis: TestKVNemesisMultiNode failed #104476

cockroach-teamcity opened this issue Jun 7, 2023 · 8 comments
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team X-duplicate Closed as a duplicate of another issue. X-unactionable This was closed because it was unactionable.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 7, 2023

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on release-23.1 @ 1a5dc2aad16ca4185afcda75d923804324dcfeb9:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1082590582
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 1281984876317631820
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3950191080
    kvnemesis.go:165: error applying x.ScanForUpdate(ctx, tk(5364256596085298844), tk(9743500752805155333), 0) // WriteTooOldError: write for key /Table/100/"5943595007238dc8" at timestamp 1686132671.535140016,0 too old; wrote at 1686132671.580430305,1: {<nil> 0 {0xc00896e6e0} <nil> 1686132671.654509064,0}
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3950191080/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3950191080/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3950191080/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3950191080/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 1
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1082590582
--- FAIL: TestKVNemesisMultiNode (19.52s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-28548

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team labels Jun 7, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Jun 7, 2023
@irfansharif
Copy link
Contributor

23.1 dup of this unfixed master issue: #101721.

@aliher1911
Copy link
Contributor

It is hard to say if it is dup. TestKVNemesisMultiNode is a catch all for all sort of underlying issues that nemesis detects.
I don't see this particular error in any of the failures in the linked issue.
Unfortunately we keep losing artifacts for failures and they are hard to reproduce.

@pav-kv
Copy link
Collaborator

pav-kv commented Jun 13, 2023

@aliher1911 @irfansharif Do any of you still have the artifacts? Otherwise I'm tempted to close this as unactionable, and hope we'll catch it while fixing #101721.

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Jun 13, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 13, 2023

cc @cockroachdb/replication

@pav-kv pav-kv added X-duplicate Closed as a duplicate of another issue. X-unactionable This was closed because it was unactionable. and removed T-kv-replication labels Jun 13, 2023
@irfansharif
Copy link
Contributor

I don't, closing it seems appropriate. I know we bounced a lot of TC agents recently so maybe that's why we lost them? The last failure was only a week ago.

@nvanbenschoten
Copy link
Member

I think we're seeing the effect of a non-transactional locking read request that hits write-write contention. We must not handle that correctly.

@tbg
Copy link
Member

tbg commented Jun 27, 2023

Possibly related to #105330

@nvanbenschoten nvanbenschoten self-assigned this Jun 27, 2023
@nvanbenschoten
Copy link
Member

I've stressed this for over 1.3M iterations and 30 hours on 20 gce nodes without seeing a failure. During that time, I had applied the following patch to increase the number of ScanForUpdate operations in the workload mix:

diff --git a/pkg/kv/kvnemesis/generator.go b/pkg/kv/kvnemesis/generator.go
index 691cca96bcc..d41e760a573 100644
--- a/pkg/kv/kvnemesis/generator.go
+++ b/pkg/kv/kvnemesis/generator.go
@@ -188,14 +188,14 @@ func newAllOperationsConfig() GeneratorConfig {
                GetMissingForUpdate:       1,
                GetExisting:               1,
                GetExistingForUpdate:      1,
-               PutMissing:                1,
-               PutExisting:               1,
+               PutMissing:                25,
+               PutExisting:               25,
                Scan:                      1,
-               ScanForUpdate:             1,
+               ScanForUpdate:             50,
                ReverseScan:               1,
                ReverseScanForUpdate:      1,
-               DeleteMissing:             1,
-               DeleteExisting:            1,
+               DeleteMissing:             25,
+               DeleteExisting:            25,
                DeleteRange:               1,
                DeleteRangeUsingTombstone: 1,
                AddSSTable:                1,

Given that this is an unexpected but innocuous error for a non-transactional ScanForUpdate request (something we never issue in practice), I'm going to close this out. We've already spent $412.85 (according to roachprod) trying to reproduce and it doesn't seem worthwhile to push any harder unless we see it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team X-duplicate Closed as a duplicate of another issue. X-unactionable This was closed because it was unactionable.
Projects
None yet
Development

No branches or pull requests

6 participants