Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[WIP] storage: intentInterleavingIter support for multiple intents
We allow a key (with multiple versions) to have multiple intents, under the condition that at most one of the intents is uncommitted. To aid this behavior we introduce a commitMap, that maintains a logical set of TxnIDs of transactions that were "simple" in their behavior, and have committed, where simple is defined as all the following conditions: - No savepoint rollbacks: len(intent.IgnoredSeqNums)==0 - Single epoch, i.e., TxnMeta.Epoch==0 - Never pushed, i.e., TxnMeta.MinTimestamp==TxnMeta.WriteTimestamp For such transactions, their provisional values can be considered committed with the current version and local timestamp, i.e., we need no additional information about the txn other than the TxnID. Adding to commitMap: The earliest a txn can be added to the commitMap is when the transaction is in STAGING and has verified all the conditions for commit. That is, this can be done before the transition to COMMITTED is replicated. For the node that has the leaseholder of the range containing the txn record, this requires no external communication. For other nodes with intents for the txn, one could piggyback this information on the RPCs for async intent resolution, and add to the the commitMap before doing the intent resolution -- this piggybacking would incur 2 consensus rounds of contention. If we are willing to send the RPC earlier, it will be contention for 1 consensus round only. Note that these RPCs should also remove locks from the non-persistent lock table data-structure so should send information about the keys (like in a LockUpdate but won't remove the durable lock state). Removal from commitMap: The commitMap can be considered a cache of TxnIDs. It is helpful to have a txn in the cache until its intents have been resolved. Additionally, latchless intent resolution must pin a txn in the map before it releases latches and unpin when the intent resolution has been applied on the leaseholder. This pinning behavior is needed to ensure the correctness of the in-memory concurrency.lockTable, which must maintain the property that the replicated locks known to it are a subset of the persistent replicated locks. We are assuming here that there is no lockTable on followers. Why "simple-committed": - We don't want to have to coordinate intent resolution of these multiple intents, by mandating that the resolution happen in any particular order. - We want to guarantee that even if the commitMap is cleared (since it is a cache), we can maintain the invariant that a caller iterating over a key sees at most one intent. As we will illustrate below, providing this guarantee requires us to limit the commitMap to only contain simple-committed txns. Consider a key with timestamps t5, t4, t3, t2, t1 in decreasing order and intents for t5, t4, t2, with corresponding txns txn5, ... txn1. We consider the disposition of an intent to be either unknown or simple-committed. In this example, the disposition of the intent for t4 and t2 is simple-committed solely based on the fact that there is at least one version (provisional or otherwise) more recent that the timestamp of the intent. That is, at most one intent, the one for t5, has a disposition that needs to rely on knowledge that is not self-contained in the history. For t5, we must rely on the commitMap to decide whether is unknown or simple-committed. It is possible that some user of the intentInterleavingIter saw t5 as simple-committed and a later user sees it as unknown disposition, if the txn5 got removed from the commitMap -- such regression is harmless since the latter user will simply have to do intent resolution. Note that the intent for t5 could get resolved before those for t4 and t2, and that is also fine since the disposition of t4, t2 stays simple-committed. If txn5 is aborted and the intent for t5 removed, and txn4 is no longer in the commitMap, the disposition of t4 could change to unknown. This is also acceptable, since t5 was only serving as a "local promise" that t4 was committed, which is simply an optimization. There is still a less efficient globally available "promise" that t4 is committed, and intent resolution of t4 is how we will enact that promise. Maintaining the above guarantees requires that historical versions must not be garbage collected without resolving intents. This is acceptable since GC is not latency sensitive. This PR only introduces the changes for the intentInterleavingIter. It excludes: - A sophisticated commitMap data-structure. There are code comments sketching out what properties are desirable. - The iterForKeyVersions implementation used for intent resolution, now that we can have multiple intents. This will be straightforward. - The changes for latchless intent resolution. These should be straightforward once we have a commitMap with pinning support and we extend cockroachdb#55461 to use a consistent storage snapshot for intent resolution. - The KV layer changes to send lists of intent keys for simple-committed txns to the various ranges, so that they can add to their commitMap and remove from the in-memory lockTable. Informs cockroachdb#66867 Release note: None
- Loading branch information