kv: ClearRange can "uncommit" transactions #46764

nvanbenschoten · 2020-03-31T03:38:12Z

Like the title describes, I'm curious whether a ClearRange request can clear an intent. If so, we may have a problem if it can clear an intent owned by a STAGING transaction. Could this allow an implicitly committed transaction to "uncommit" and later be marked as aborted?

The request type does have the following warning:

cockroach/pkg/roachpb/api.proto

Lines 291 to 293 in 8a50e9f

    
           // NOTE: it is important that this method only be invoked on a key 
        
           // range which is guaranteed to be both inactive and not see future 
        
           // writes. Ignoring this warning may result in data loss.

But I don't think anything ensures that the key range is inactive in the sense that it has no intents. That could be ensured by scanning over the range with a high-priority scan after taking it offline.

@tbg do you know anything about this?

tbg · 2020-03-31T09:49:51Z

That sounds like it is a real problem, good catch. If an intent is left on a table and then said table is dropped, we'll potentially "uncommit" a transaction. This is another thing that will become much nicer with the dedicated lock table. We'll be able to ensure via the lock table only that we're not clearing any intents, where right now we have only the option 1) if no estimates present, and IntentCount==0, proceed or otherwise 2) scan the whole range and handle intents as needed.

I agree that a prior full table scan would also solve this, but I would like ClearRange itself to maintain the low-level txn invariants.

61544: kv: add TestTxnClearRangeIntents to verify ClearRange intent behavior r=tbg a=erikgrinaker This adds a test case verifying that a `ClearRange()` call can remove a write intent belonging to an implicitly committed `STAGING` transaction. This will cause subsequent txn recovery to roll back the entire txn, even though it has already been committed. `ClearRange()` does require the caller to ensure there are no intents in the cleared range, but this is a bit of a footgun, and it should ideally ensure that txn invariants are enforced itself -- this will be addressed separately. Touches #46764. Release justification: non-production code changes Release note: None Co-authored-by: Erik Grinaker <[email protected]>

erikgrinaker · 2021-03-10T12:44:44Z

Discussed this with @tbg yesterday. We'll be implementing a fix using the separated intents lock table, which will be enabled by default in the medium term (there's otherwise no efficient way to detect intents short of scanning the range). If STAGING intents are found, we'll return a WriteIntentError with all of them and let a higher layer deal with recovery before retrying the ClearRange call. We'll also have to take out a range lock for the call.

@sumeerbhola As for detecting intents using the lock table, I see two options: either scan the lock table directly using Reader.NewEngineIterator() with keys.LockTableSingleKey() bounds, or add an option MVCCIntentsIterKind for MVCCIterator that scans intents only. I think I prefer the latter, to keep the lower-level logic in storage. What do you you think?

Also, I see that we're not planning to migrate existing intents to separated intents, which means that even after it's enabled by default we may still be vulnerable to this issue if there are old interleaved intents left in the range. Is that still the plan, and does that mean that we'd have to do a full range scan in all cases or is there some way to detect this?

cockroach/pkg/storage/intent_reader_writer.go

Lines 40 to 44 in d1c91e0

    
           // Currently there is no long-running migration to replace all interleaved 
        
           // intents with separated intents, but we expect that when a cluster has been 
        
           // running with this flag set to true for some time, most ranges will only 
        
           // have separated intents. Similarly, setting this to false will gradually 
        
           // cause most ranges to only have interleaved intents.

sumeerbhola · 2021-03-10T14:49:16Z

How about using the stats, that I mentioned in #61544 (review)? Regarding the migration of existing interleaved intents to separated intents, the current plan is to write the migration for 21.2, so after a cluster is finalized with only 21.2 nodes the migration would run. I assumed we were trying to fix this for 21.1.

I prefer using EngineIterator. And yes, it is probably better if we don't expose understanding of the lock table keys in too many places outside the storage package. We could add a function to storage that takes a Reader and [start, end) and checks that there are no intents.

erikgrinaker · 2021-03-10T15:49:34Z

How about using the stats, that I mentioned in #61544 (review)?

I'll have a look, would be great if that was possible.

Regarding the migration of existing interleaved intents to separated intents, the current plan is to write the migration for 21.2, so after a cluster is finalized with only 21.2 nodes the migration would run. I assumed we were trying to fix this for 21.1.

Right, for 21.1 I think we'll have to do a full range scan, although the stats could allow us to bypass this if they're reliable. We felt like we could probably punt this until separated intents are fully enabled and migrated in 21.2, as it is pretty edge-casey.

I prefer using EngineIterator. And yes, it is probably better if we don't expose understanding of the lock table keys in too many places outside the storage package. We could add a function to storage that takes a Reader and [start, end) and checks that there are no intents.

Makes sense, I'll add a function.

nvanbenschoten added C-question A question rather than an issue. No code/spec/doc change needed. A-kv-transactions Relating to MVCC and the transactional model. labels Mar 31, 2020

erikgrinaker self-assigned this Mar 2, 2021

This was referenced Mar 5, 2021

kv: add TestTxnClearRangeIntents to verify ClearRange intent behavior #61544

Merged

kvserver: ClearRange optimization does not remove separated intents #61606

Closed

tbg changed the title ~~kv: can a ClearRange request clear an intent?~~ kv: ClearRange can "uncommit" transactions? Mar 8, 2021

tbg changed the title ~~kv: ClearRange can "uncommit" transactions?~~ kv: ClearRange can "uncommit" transactions Mar 8, 2021

erikgrinaker mentioned this issue Mar 11, 2021

kvserver: detect and return intents in ClearRange #61850

Merged

craig bot closed this as completed in 3267fc5 Mar 17, 2021

erikgrinaker mentioned this issue Mar 22, 2021

release-21.1: kvserver: detect and return intents in ClearRange #62376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: ClearRange can "uncommit" transactions #46764

kv: ClearRange can "uncommit" transactions #46764

nvanbenschoten commented Mar 31, 2020

tbg commented Mar 31, 2020

erikgrinaker commented Mar 10, 2021 •

edited

Loading

sumeerbhola commented Mar 10, 2021

erikgrinaker commented Mar 10, 2021

kv: ClearRange can "uncommit" transactions #46764

kv: ClearRange can "uncommit" transactions #46764

Comments

nvanbenschoten commented Mar 31, 2020

tbg commented Mar 31, 2020

erikgrinaker commented Mar 10, 2021 • edited Loading

sumeerbhola commented Mar 10, 2021

erikgrinaker commented Mar 10, 2021

erikgrinaker commented Mar 10, 2021 •

edited

Loading