-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: ClearRange can "uncommit" transactions #46764
Comments
That sounds like it is a real problem, good catch. If an intent is left on a table and then said table is dropped, we'll potentially "uncommit" a transaction. This is another thing that will become much nicer with the dedicated lock table. We'll be able to ensure via the lock table only that we're not clearing any intents, where right now we have only the option 1) if no estimates present, and IntentCount==0, proceed or otherwise 2) scan the whole range and handle intents as needed. I agree that a prior full table scan would also solve this, but I would like ClearRange itself to maintain the low-level txn invariants. |
61544: kv: add TestTxnClearRangeIntents to verify ClearRange intent behavior r=tbg a=erikgrinaker This adds a test case verifying that a `ClearRange()` call can remove a write intent belonging to an implicitly committed `STAGING` transaction. This will cause subsequent txn recovery to roll back the entire txn, even though it has already been committed. `ClearRange()` does require the caller to ensure there are no intents in the cleared range, but this is a bit of a footgun, and it should ideally ensure that txn invariants are enforced itself -- this will be addressed separately. Touches #46764. Release justification: non-production code changes Release note: None Co-authored-by: Erik Grinaker <[email protected]>
Discussed this with @tbg yesterday. We'll be implementing a fix using the separated intents lock table, which will be enabled by default in the medium term (there's otherwise no efficient way to detect intents short of scanning the range). If @sumeerbhola As for detecting intents using the lock table, I see two options: either scan the lock table directly using Also, I see that we're not planning to migrate existing intents to separated intents, which means that even after it's enabled by default we may still be vulnerable to this issue if there are old interleaved intents left in the range. Is that still the plan, and does that mean that we'd have to do a full range scan in all cases or is there some way to detect this? cockroach/pkg/storage/intent_reader_writer.go Lines 40 to 44 in d1c91e0
|
How about using the stats, that I mentioned in #61544 (review)? Regarding the migration of existing interleaved intents to separated intents, the current plan is to write the migration for 21.2, so after a cluster is finalized with only 21.2 nodes the migration would run. I assumed we were trying to fix this for 21.1. I prefer using |
I'll have a look, would be great if that was possible.
Right, for 21.1 I think we'll have to do a full range scan, although the stats could allow us to bypass this if they're reliable. We felt like we could probably punt this until separated intents are fully enabled and migrated in 21.2, as it is pretty edge-casey.
Makes sense, I'll add a function. |
Like the title describes, I'm curious whether a ClearRange request can clear an intent. If so, we may have a problem if it can clear an intent owned by a STAGING transaction. Could this allow an implicitly committed transaction to "uncommit" and later be marked as aborted?
The request type does have the following warning:
cockroach/pkg/roachpb/api.proto
Lines 291 to 293 in 8a50e9f
But I don't think anything ensures that the key range is inactive in the sense that it has no intents. That could be ensured by scanning over the range with a high-priority scan after taking it offline.
@tbg do you know anything about this?
The text was updated successfully, but these errors were encountered: