db: SeekPrefixGE lazy positioning, or GetPrefix #2002

jbowens · 2022-10-11T20:39:02Z

I have not thought through the design space here in detail, but it seems possible to use MVCC metadata about sstables (eg, the computed block properties) to avoid reading files that contain older versions of a key during a SeekPrefixGE. The goal would be to reduce block reads during MVCCGets, making MVCCGets performance profile more similar to a pebble Get.

A design, just to serve as an illustrative example:

the sstable iterator SeekPrefixGE returns a synthetic <prefix>@<sstable's max mvcc timestamp>key if the bloom filter indicates the file may contain the key
the merging iterator is updated to skip past these synthetic keys (much like it does today for range tombstones)
nexting a sstable iterator last positioned by SeekPrefixGE actually performs the seek

if we elevated MVCC timestamps into the *fileMetadata, it seems like we could even avoid some of the bloom filter reads and table loads.

Somewhat related to #2182.

Jira issue: PEBBLE-142

The text was updated successfully, but these errors were encountered:

jbowens · 2023-02-15T04:18:43Z

Added the consider-23.2 label because this and avoiding bloom filter block reads (by elevating the mvcc props to the fileMetadata) seems especially beneficial with disaggregated storage, where each new table read requires reading from remote storage.

sumeerbhola · 2023-02-15T14:31:31Z

I'm a bit confused by

the sstable iterator SeekPrefixGE returns a synthetic @<sstable's max mvcc timestamp>key if the bloom filter indicates the file may contain the key

since I thought this is trying to avoid reading the bloom filter of that file. Maybe I am misreading the sentence.
Is the idea here that many levels will have files that overlap with the prefix, but the max MVCC timestamp in most of these files will be smaller than the actual largest MVCC timestamp for that prefix, so these files will never rise to the top of the heap and so we will never need to read their bloom filters? Nice idea! It should help when the version being read was written recently, which I believe is more common for workloads with huge tables with lots of old data (which are exactly the ones where the bloom filter is more likely to miss the cache).

jbowens · 2023-02-15T21:23:22Z

since I thought this is trying to avoid reading the bloom filter of that file.

Ah yeah, when I started writing the issue I was thinking we'd perform the bloom filter check before returning the synthetic key and we'd only be saving the block loads on the tables that the filter fails to exclude, but I think it makes more sense to reverse the order to try to avoid the bloom filter check as well.

Is the idea here that many levels will have files that overlap with the prefix, but the max MVCC timestamp in most of these files will be smaller than the actual largest MVCC timestamp for that prefix, so these files will never rise to the top of the heap and so we will never need to read their bloom filters?

Yeah, that's right

jbowens · 2024-01-24T21:45:06Z

This may be helpful with #3230 for workloads that are frequently reading and updating recently-written keys. It would shift the working set of data towards the higher levels of the LSM (which are also smaller), making it more likely that a read finds that everything it needs is already in either the block cache or OS page cache.

jbowens added C-enhancement New feature or request T-storage A-storage labels Oct 11, 2022

jbowens added the consider-23.2 label Feb 15, 2023

jbowens mentioned this issue Feb 15, 2023

sstable,objstorage: seed Reader with pre-stored metadata #2328

Closed

jbowens removed the consider-23.2 label Feb 17, 2023

jbowens mentioned this issue Dec 14, 2023

db: elevate some table-level props to fileMetadata #3160

Open

jbowens mentioned this issue Jan 29, 2024

sstable: index block always loaded; even if unused #3248

Open

jbowens added this to [Deprecated] Storage Jun 4, 2024

jbowens moved this to 24.2 candidates in [Deprecated] Storage Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db: SeekPrefixGE lazy positioning, or GetPrefix #2002

db: SeekPrefixGE lazy positioning, or GetPrefix #2002

jbowens commented Oct 11, 2022 •

edited by cockroach-jira-scripts

Loading

jbowens commented Feb 15, 2023

sumeerbhola commented Feb 15, 2023

jbowens commented Feb 15, 2023

jbowens commented Jan 24, 2024 •

edited

Loading

db: SeekPrefixGE lazy positioning, or GetPrefix #2002

db: SeekPrefixGE lazy positioning, or GetPrefix #2002

Comments

jbowens commented Oct 11, 2022 • edited by cockroach-jira-scripts Loading

jbowens commented Feb 15, 2023

sumeerbhola commented Feb 15, 2023

jbowens commented Feb 15, 2023

jbowens commented Jan 24, 2024 • edited Loading

jbowens commented Oct 11, 2022 •

edited by cockroach-jira-scripts

Loading

jbowens commented Jan 24, 2024 •

edited

Loading