forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
storage: performance improvements for intent resolution
When intents are separated, the version suffix is the txn UUID, which bears no ordering relationship to preceding txns. In contrast, with interleaved intents, which reuse the same key, the latest intent has the highest Pebble seqnum. There is another way in which way separated intents differ: usually we use a SingleDelete to remove a previous intent written using a Set. When these meet as part of a flush or compaction, both disappear. In contrast, with interleaved intents, we write Delete to remove a previous intent. When they meet in a compaction/flush, the Set disappears, but the Delete will not typically disappear until it reaches L6 (if there are any overlapping files in lower levels). So if Delete/SingleDelete have been compacted, there are usally fewer obsolete key seqnums in Pebble with SingleDelete. We consider three kinds of reads when many intents have been written and resolved for a key, and compactions have not happened. Note, that this should not be common, especially in a node with write traffic spread across many keys and ranges that share the same LSM. But it has been observed in tests, and we may as well improve this behavior. - Read of a specific intent written by a txn: a Seek is used to find that intent. With interleaved intents, the highest seqnum for that key is that live intent. With separated intents, on average, that live intent will have half the deleted intents preceding it and half succeeding it. The current code for separated intents does not optimize, by using the txn UUID to skip past these deleted intents. This is fixed in this PR, and affects MVCCResolveWriteIntent. - Read of an intent followed by read of the latest version: This can happen on the read path. With interleaved intents, one does not need to traverse any Deletes/Sets before encountering the intent, but calling Next on the pebble.Iterator will cause it to traverse all these deleted intents. With separated intents, half of these will be traversed before the intent and half after. So the count that is traversed is the same. Additionally, any compaction/flush will help more with separated intents since both SingleDelete and Set disappear. - Read of all intents in a range: this happens with MVCCResolveWriteIntentRange. It uses Seeks to go from one intent to another. This is a bad case for separated intents since it needs to iterate over half the deleted intents for each live intent. I don't know of a way to optimize this, if indeed this is important to do, without changing the lock table key to prefix the txn UUID with the timestamp (which would make it longer and introduce expense). This PR adds 3 benchmarks that mimic the above scenarios: BenchmarkIntentResolution, BenchmarkIntentScan, BenchmarkIntentRangeResolution. These benchmarks vary the number of versions and how many are flushed. Note that for the 400 version case, some flushing happens even without the explicit flush. Also, these benchmarks don't populate lower levels so the tombstone elision code in Pebble will manage to elide even the Delete operations and not just the SingleDeletes when doing the flush. It adds the following optimizations: - MVCCIterator.SeekIntentGE can be used to seek to an intent for a particular txn UUID. It is used by MVCCResolveWriteIntent. - Avoid seeking twice for MVCCResolveWriteIntentRange since the iterator is already positioned correctly. - Memory allocation improvements to LockTableKey.ToEngineKey and intentInterleavingReader. Informs cockroachdb#41720 Release note: None
- Loading branch information
1 parent
9171f18
commit d1c91e0
Showing
12 changed files
with
403 additions
and
61 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.