kv: allow range merge txn to be pushed, even after lease transfer #59308

nvanbenschoten · 2021-01-22T18:46:00Z

Needed for non-blocking transactions: #52745.

Currently, the range merge transaction is not allowed to be pushed to a higher timestamp. In the case that it is pushed, the transaction will abort and retry. This poses an issue for non-blocking transactions, which will cause all writes to a range to be pushed. As is, this makes non-blocking transactions incompatible with range merges.

To fix this, we will need to allow the range merge transaction to be pushed, at least during its local range descriptor writes. The current proposal is to do something like #44090. We will stop observing the range merge's commit timestamp at the beginning of the transaction. Instead, we will manually refresh [1] the transaction immediately before the transaction's critical phase (i.e. immediately before issuing the SubsumeRequest). After doing so, we can observe the commit timestamp to block any future refreshes, though we don't expect to see any.

This will be enough to make range merges compatible with non-blocking transactions.

[1] @ajwerner will be adding support for this in or around #56588.

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2021-02-25T19:59:02Z

Unfortunately, kvnemesis revealed that the fix we made in #60567 was insufficient. It turns out that the range merge transaction enters its "critical phase" earlier than the SubsumeRequest under some situations. The RHS can enter its critical phase earlier in cases where the RHS's lease moves after the deletion of its local range descriptor but before the SubsumeRequest - see the transfer of power section from the range merges tech note.

This is most easily demonstrates by issuing an AdminTransferLease request to the RHS after its local range descriptor has been deleted by this batch but before it has been subsumed by this batch. This is the sequence of events generated by kvnemesis which revealed this bug.

I intend to fix this by removing the manual refresh added in #60567, removing the _ = txn.CommitTimestamp() call in the range merge txn which prevents it from refreshing, and opening up the frozen RHS to allow refresh request on the local range descriptor from the range merge transaction (and only the range merge transaction). This should be safe, as a refresh from the range merge transaction indicates that the transaction has not yet committed, and the local range descriptor already has an intent on it while the RHS is frozen that must be resolved with the merge txn's commit timestamp if the merge does commit, which will necessarily exceed the refresh timestamp. This means that the fact that the refresh timestamp may exceed the freeze timestamp is safe.

I'll make this change once a similar bug dealing with range merges and lease transfers is fixed in #60905.

Fixes cockroachdb#59308. This commit adds support for range merge transactions to refresh. This is necessary to allow the merge transaction to have its write timestamp be bumped and still commit without retrying. In such cases, the transaction must refresh its reads, including its original read on the RHS's local range descriptor. If we were to block this refresh on the frozen RHS range, the merge would deadlock. On the surface, it seems unsafe to permit Refresh requests on an already subsumed RHS range, because the refresh's effect on the timestamp cache will never make it to the LHS leaseholder. This risks the future joint range serving a write that invalidates the Refresh. However, in this specific situation, we can be sure that such a serializability violation will not occur because the Range merge also writes to (deletes) this key. This means that if the Range merge transaction commits, its intent on the key will be resolved to the timestamp of the refresh and no future write will ever be able to violate the refresh. Conversely, if the Range merge transaction does not commit, then the merge will fail and the update to the RHS's timestamp cache will not be lost (not that this particularly matters in cases of aborted transactions). The same line of reasoning as the one above has motivated us to explore removing keys from a transaction's refresh spans when they are written to by the transaction, as the intents written by the transaction act as a form of pessimistic lock that obviate the need for the optimistic refresh. Such an improvement would eliminate the need for this special case, but until we generalize the mechanism to prune refresh spans based on intent spans, we're forced to live with this. See cockroachdb#59308 (comment) for why the original fix, which attempted to manually refresh the range merge transaction before it entered its critical phase, was not sufficient. Release justification: needed for new functionality.

61324: kv: permit merge transactions to refresh after subsumption r=nvanbenschoten a=nvanbenschoten Fixes #59308. This commit adds support for range merge transactions to refresh. This is necessary to allow the merge transaction to have its write timestamp be bumped and still commit without retrying. In such cases, the transaction must refresh its reads, including its original read on the RHS's local range descriptor. If we were to block this refresh on the frozen RHS range, the merge would deadlock. On the surface, it seems unsafe to permit Refresh requests on an already subsumed RHS range, because the refresh's effect on the timestamp cache will never make it to the LHS leaseholder. This risks the future joint range serving a write that invalidates the Refresh. However, in this specific situation, we can be sure that such a serializability violation will not occur because the Range merge also writes to (deletes) this key. This means that if the Range merge transaction commits, its intent on the key will be resolved to the timestamp of the refresh and no future write will ever be able to violate the refresh. Conversely, if the Range merge transaction does not commit, then the merge will fail and the update to the RHS's timestamp cache will not be lost (not that this particularly matters in cases of aborted transactions). The same line of reasoning as the one above has motivated us to explore removing keys from a transaction's refresh spans when they are written to by the transaction, as the intents written by the transaction act as a form of pessimistic lock that obviate the need for the optimistic refresh. Such an improvement would eliminate the need for this special case, but until we generalize the mechanism to prune refresh spans based on intent spans, we're forced to live with this. See #59308 (comment) for why the original fix, which attempted to manually refresh the range merge transaction before it entered its critical phase, was not sufficient. Release justification: needed for new functionality. Co-authored-by: Nathan VanBenschoten <[email protected]>

Fixes cockroachdb#59308. This commit adds support for range merge transactions to refresh. This is necessary to allow the merge transaction to have its write timestamp be bumped and still commit without retrying. In such cases, the transaction must refresh its reads, including its original read on the RHS's local range descriptor. If we were to block this refresh on the frozen RHS range, the merge would deadlock. On the surface, it seems unsafe to permit Refresh requests on an already subsumed RHS range, because the refresh's effect on the timestamp cache will never make it to the LHS leaseholder. This risks the future joint range serving a write that invalidates the Refresh. However, in this specific situation, we can be sure that such a serializability violation will not occur because the Range merge also writes to (deletes) this key. This means that if the Range merge transaction commits, its intent on the key will be resolved to the timestamp of the refresh and no future write will ever be able to violate the refresh. Conversely, if the Range merge transaction does not commit, then the merge will fail and the update to the RHS's timestamp cache will not be lost (not that this particularly matters in cases of aborted transactions). The same line of reasoning as the one above has motivated us to explore removing keys from a transaction's refresh spans when they are written to by the transaction, as the intents written by the transaction act as a form of pessimistic lock that obviate the need for the optimistic refresh. Such an improvement would eliminate the need for this special case, but until we generalize the mechanism to prune refresh spans based on intent spans, we're forced to live with this. See cockroachdb#59308 (comment) for why the original fix, which attempted to manually refresh the range merge transaction before it entered its critical phase, was not sufficient. Release justification: needed for new functionality.

nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. A-kv-transactions Relating to MVCC and the transactional model. labels Jan 22, 2021

nvanbenschoten assigned ajwerner and nvanbenschoten Jan 22, 2021

ajwerner mentioned this issue Feb 15, 2021

kv/kvserver: allow the merge transaction to be pushed #60567

Merged

craig bot closed this as completed in 8f5dc42 Feb 16, 2021

nvanbenschoten reopened this Feb 25, 2021

nvanbenschoten changed the title ~~kv: allow range merge txn to be pushed~~ kv: allow range merge txn to be pushed, even after lease transfer Feb 25, 2021

nvanbenschoten mentioned this issue Mar 2, 2021

kv: permit merge transactions to refresh after subsumption #61324

Merged

craig bot closed this as completed in df1c620 Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: allow range merge txn to be pushed, even after lease transfer #59308

kv: allow range merge txn to be pushed, even after lease transfer #59308

nvanbenschoten commented Jan 22, 2021

nvanbenschoten commented Feb 25, 2021

kv: allow range merge txn to be pushed, even after lease transfer #59308

kv: allow range merge txn to be pushed, even after lease transfer #59308

Comments

nvanbenschoten commented Jan 22, 2021

nvanbenschoten commented Feb 25, 2021