-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvnemesis: TestKVNemesisSingleNode: committed deleteRangeUsingTombstone non-atomic timestamps #104865
Comments
cc @cockroachdb/replication |
Hi @tbg, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
cc @cockroachdb/replication |
|
I did something to make this work when I wrote kvnemesis v2 (by preventing splitting them across ranges), maybe that somehow broke, will take a quick look |
Yeah we just prevent range-spanning here: cockroach/pkg/kv/kvnemesis/kvnemesis_test.go Lines 61 to 77 in 736a67e
and we don't ever use it in a txn: cockroach/pkg/kv/kvnemesis/generator.go Lines 270 to 271 in 6369846
I'm not sure what's going on here, but kvnemesis was changed a few times recently by @nvanbenschoten, most recently in #104356. |
Ah no, |
Ok, I'll have a look in a bit. |
The error shows that this DeleteRangeUsingTombstone (DRUT)
created six observed fragments in the rangefeed1. These fragments all ought to be at the same timestamp. They are not: the first four fragments match, then we get two more fragments with unique timestamps. The official execution timestamp ends up being the second to last unique timestamp (1686739362.069868437,0), which is also the largest of the bunch. That timestamp is interesting because we also committed a SNAPSHOT txn right then:
Note the txn timestamp is the DRUT's timestamp plus one logical tick. Perhaps this txn and the DRUT interacted somehow in a way that facilitated this issue. However, the last DRUT fragment (timestamp 1686739362.0692...) is not such an easy match for anything. That timestamp doesn't show up anywhere else. I'll leave the hard part of the analysis to you, @erikgrinaker, but happy to be a rubber 🦆. Footnotes |
Looking at the trace, it does seem like DistSender split up the request!
So this is likely a bug in the suppression of precisely that, i.e. a test-only problem. |
I think the interceptor that prevents the range-spanning request accidentally got removed here: https://github.com/cockroachdb/cockroach/pull/103963/files#r1229494377 @wenyihu6 could you send a fix? Thanks! |
104658: kvserver: attempt to explain ProposalData lifecycle r=tbg a=tbg Epic: CRDB-25287 Release note: None 104867: kvnemesis: add TestingKnobs.OnRangeSpanningNonTxnalBatch back r=tbg a=wenyihu6 #103963 accidentally removed a testing knob which caused #104865. This commit adds the testing knob back. Fixes: #104865 Release note: none Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Wenyi <[email protected]>
Describe the problem
Seen on a non-batched bors build1 for #104658 on top of 3572499 (#104685 changes only comments).
Artifacts: 1f42cf5be2fc021646bf9b2daf5eaef3.zip
Jira issue: CRDB-28758
Footnotes
https://teamcity.cockroachdb.com/viewLog.html?buildId=10528742&tab=buildResultsDiv&buildTypeId=Cockroach_UnitTests_BazelUnitTests ↩
The text was updated successfully, but these errors were encountered: