Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: allow range merge txn to be pushed, even after lease transfer #59308

Closed
nvanbenschoten opened this issue Jan 22, 2021 · 1 comment · Fixed by #60567 or #61324
Closed

kv: allow range merge txn to be pushed, even after lease transfer #59308

nvanbenschoten opened this issue Jan 22, 2021 · 1 comment · Fixed by #60567 or #61324
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@nvanbenschoten
Copy link
Member

Needed for non-blocking transactions: #52745.

Currently, the range merge transaction is not allowed to be pushed to a higher timestamp. In the case that it is pushed, the transaction will abort and retry. This poses an issue for non-blocking transactions, which will cause all writes to a range to be pushed. As is, this makes non-blocking transactions incompatible with range merges.

To fix this, we will need to allow the range merge transaction to be pushed, at least during its local range descriptor writes. The current proposal is to do something like #44090. We will stop observing the range merge's commit timestamp at the beginning of the transaction. Instead, we will manually refresh [1] the transaction immediately before the transaction's critical phase (i.e. immediately before issuing the SubsumeRequest). After doing so, we can observe the commit timestamp to block any future refreshes, though we don't expect to see any.

This will be enough to make range merges compatible with non-blocking transactions.

[1] @ajwerner will be adding support for this in or around #56588.

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. A-kv-transactions Relating to MVCC and the transactional model. labels Jan 22, 2021
@craig craig bot closed this as completed in 8f5dc42 Feb 16, 2021
@nvanbenschoten
Copy link
Member Author

Unfortunately, kvnemesis revealed that the fix we made in #60567 was insufficient. It turns out that the range merge transaction enters its "critical phase" earlier than the SubsumeRequest under some situations. The RHS can enter its critical phase earlier in cases where the RHS's lease moves after the deletion of its local range descriptor but before the SubsumeRequest - see the transfer of power section from the range merges tech note.

This is most easily demonstrates by issuing an AdminTransferLease request to the RHS after its local range descriptor has been deleted by this batch but before it has been subsumed by this batch. This is the sequence of events generated by kvnemesis which revealed this bug.

I intend to fix this by removing the manual refresh added in #60567, removing the _ = txn.CommitTimestamp() call in the range merge txn which prevents it from refreshing, and opening up the frozen RHS to allow refresh request on the local range descriptor from the range merge transaction (and only the range merge transaction). This should be safe, as a refresh from the range merge transaction indicates that the transaction has not yet committed, and the local range descriptor already has an intent on it while the RHS is frozen that must be resolved with the merge txn's commit timestamp if the merge does commit, which will necessarily exceed the refresh timestamp. This means that the fact that the refresh timestamp may exceed the freeze timestamp is safe.

I'll make this change once a similar bug dealing with range merges and lease transfers is fixed in #60905.

@nvanbenschoten nvanbenschoten changed the title kv: allow range merge txn to be pushed kv: allow range merge txn to be pushed, even after lease transfer Feb 25, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 2, 2021
Fixes cockroachdb#59308.

This commit adds support for range merge transactions to refresh. This
is necessary to allow the merge transaction to have its write timestamp
be bumped and still commit without retrying. In such cases, the
transaction must refresh its reads, including its original read on the
RHS's local range descriptor. If we were to block this refresh on the
frozen RHS range, the merge would deadlock.

On the surface, it seems unsafe to permit Refresh requests on an already
subsumed RHS range, because the refresh's effect on the timestamp cache
will never make it to the LHS leaseholder. This risks the future joint
range serving a write that invalidates the Refresh. However, in this
specific situation, we can be sure that such a serializability violation
will not occur because the Range merge also writes to (deletes) this
key. This means that if the Range merge transaction commits, its intent
on the key will be resolved to the timestamp of the refresh and no
future write will ever be able to violate the refresh. Conversely, if the
Range merge transaction does not commit, then the merge will fail and
the update to the RHS's timestamp cache will not be lost (not that this
particularly matters in cases of aborted transactions).

The same line of reasoning as the one above has motivated us to explore
removing keys from a transaction's refresh spans when they are written
to by the transaction, as the intents written by the transaction act as
a form of pessimistic lock that obviate the need for the optimistic
refresh. Such an improvement would eliminate the need for this special
case, but until we generalize the mechanism to prune refresh spans based
on intent spans, we're forced to live with this.

See cockroachdb#59308 (comment)
for why the original fix, which attempted to manually refresh the range
merge transaction before it entered its critical phase, was not sufficient.

Release justification: needed for new functionality.
craig bot pushed a commit that referenced this issue Mar 3, 2021
61324: kv: permit merge transactions to refresh after subsumption r=nvanbenschoten a=nvanbenschoten

Fixes #59308.

This commit adds support for range merge transactions to refresh. This
is necessary to allow the merge transaction to have its write timestamp
be bumped and still commit without retrying. In such cases, the
transaction must refresh its reads, including its original read on the
RHS's local range descriptor. If we were to block this refresh on the
frozen RHS range, the merge would deadlock.

On the surface, it seems unsafe to permit Refresh requests on an already
subsumed RHS range, because the refresh's effect on the timestamp cache
will never make it to the LHS leaseholder. This risks the future joint
range serving a write that invalidates the Refresh. However, in this
specific situation, we can be sure that such a serializability violation
will not occur because the Range merge also writes to (deletes) this
key. This means that if the Range merge transaction commits, its intent
on the key will be resolved to the timestamp of the refresh and no
future write will ever be able to violate the refresh. Conversely, if the
Range merge transaction does not commit, then the merge will fail and
the update to the RHS's timestamp cache will not be lost (not that this
particularly matters in cases of aborted transactions).

The same line of reasoning as the one above has motivated us to explore
removing keys from a transaction's refresh spans when they are written
to by the transaction, as the intents written by the transaction act as
a form of pessimistic lock that obviate the need for the optimistic
refresh. Such an improvement would eliminate the need for this special
case, but until we generalize the mechanism to prune refresh spans based
on intent spans, we're forced to live with this.

See #59308 (comment)
for why the original fix, which attempted to manually refresh the range
merge transaction before it entered its critical phase, was not sufficient.

Release justification: needed for new functionality.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in df1c620 Mar 3, 2021
tbg pushed a commit to tbg/cockroach that referenced this issue Mar 4, 2021
Fixes cockroachdb#59308.

This commit adds support for range merge transactions to refresh. This
is necessary to allow the merge transaction to have its write timestamp
be bumped and still commit without retrying. In such cases, the
transaction must refresh its reads, including its original read on the
RHS's local range descriptor. If we were to block this refresh on the
frozen RHS range, the merge would deadlock.

On the surface, it seems unsafe to permit Refresh requests on an already
subsumed RHS range, because the refresh's effect on the timestamp cache
will never make it to the LHS leaseholder. This risks the future joint
range serving a write that invalidates the Refresh. However, in this
specific situation, we can be sure that such a serializability violation
will not occur because the Range merge also writes to (deletes) this
key. This means that if the Range merge transaction commits, its intent
on the key will be resolved to the timestamp of the refresh and no
future write will ever be able to violate the refresh. Conversely, if the
Range merge transaction does not commit, then the merge will fail and
the update to the RHS's timestamp cache will not be lost (not that this
particularly matters in cases of aborted transactions).

The same line of reasoning as the one above has motivated us to explore
removing keys from a transaction's refresh spans when they are written
to by the transaction, as the intents written by the transaction act as
a form of pessimistic lock that obviate the need for the optimistic
refresh. Such an improvement would eliminate the need for this special
case, but until we generalize the mechanism to prune refresh spans based
on intent spans, we're forced to live with this.

See cockroachdb#59308 (comment)
for why the original fix, which attempted to manually refresh the range
merge transaction before it entered its critical phase, was not sufficient.

Release justification: needed for new functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
2 participants