Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: add block level synthetic prefix support #3237

Closed
wants to merge 1 commit into from

Conversation

dt
Copy link
Member

@dt dt commented Jan 19, 2024

This adds the concept of synthetic prefixes to sstable readers and their underlying block iterators, treating the synthetic prefix as an extra shared prefix, shared even by restart keys.

When a reader is configured with a synthetic prefix, it will assume that that prefix is implicitly prepended to every key in the underlying sst blocks when interacting with or returning those keys.

@dt dt requested review from RaduBerinde and petermattis January 19, 2024 19:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 16 files reviewed, 3 unresolved discussions (waiting on @dt, @msbutler, and @petermattis)


sstable/block.go line 414 at r1 (raw file):

	}
	hideObsoletePoints bool
	SyntheticPrefix

[nit] we should avoid embedding unless we want to specifically inherit the methods


sstable/block.go line 644 at r1 (raw file):

		return base.CorruptionErrorf("pebble/table: invalid firstKey in block")
	}
	if i.SyntheticPrefix != nil {

Should we use a reusable buffer (similar to fullKey) to avoid this alloc when restForReuse is used?


sstable/reader.go line 174 at r1 (raw file):

}

type SyntheticPrefix []byte

[nit] will need some comments

sstable/block.go Outdated Show resolved Hide resolved
This adds the concept of synthetic prefixes to sstable readers and their
underlying block iterators, treating the synthetic prefix as an extra
shared prefix, shared even by restart keys.

When a reader is configured with a synthetic prefix, it will assume that
that prefix is implicitly prepended to every key in the underlying sst
blocks when interacting with or returning those keys.
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 16 files reviewed, 8 unresolved discussions (waiting on @dt and @petermattis)


sstable/block.go line 647 at r2 (raw file):

	}
	if i.prefix != nil {
		i.firstUserKey = append(append(i.firstUserKeyWithPrefix[:0], i.prefix...), i.firstUserKey...)

nit: the naming here is confusing. I thought firstUserKey was without the prefix, but it seems it is also with prefix, and firstUserKeyWithPrefix is really firstUserKeyWithPrefixBacking.


sstable/block.go line 704 at r2 (raw file):

// SeekGE implements internalIterator.SeekGE, as documented in the pebble
// package.
func (i *blockIter) SeekGE(key []byte, flags base.SeekGEFlags) (*InternalKey, base.LazyValue) {

what's the effect of these changes on benchmarks that don't have a prefix?


sstable/block.go line 863 at r2 (raw file):

		if i.prefix != nil {
			if !bytes.HasPrefix(key, i.prefix) {
				if i.cmp(i.prefix, key) < 0 {

where do we say that Compare needs to be defined for a prefix and not just user keys? For example EngineKeyCompare (the CRDB implementation of Compare) is explicitly looking for the separator between the prefix and the suffix.

@RaduBerinde
Copy link
Member

table_cache.go line 884 at r2 (raw file):

	pprof.Do(context.Background(), tableCacheLabels, func(context.Context) {
		var prefix []byte
		if meta.PrefixReplacement != nil && len(meta.PrefixReplacement.ContentPrefix) == 0 {

There is something subtle going on here. We only cache physical SST readers. We are pushing down the prefix replacement rules from the virtual SST into the physical reader (which can be shared among all virtual SSTs).

In principle we can currently have multiple virtual SSTs with the same backing SST that have different prefix replacement rules. I believe we don't need that functionality so we should require (and enforce) that all virtual SSTs with the same backing file have the same replacement rules. Once we do that, it's ok to push down the rules into the physical reader. (but we should still add an assertion that these things match when we use a cached physical reader).

This is an existing issue w.r.t SmallestSeqNum/LargestSeqNum below.

@RaduBerinde
Copy link
Member

table_cache.go line 884 at r2 (raw file):

Previously, RaduBerinde wrote…

There is something subtle going on here. We only cache physical SST readers. We are pushing down the prefix replacement rules from the virtual SST into the physical reader (which can be shared among all virtual SSTs).

In principle we can currently have multiple virtual SSTs with the same backing SST that have different prefix replacement rules. I believe we don't need that functionality so we should require (and enforce) that all virtual SSTs with the same backing file have the same replacement rules. Once we do that, it's ok to push down the rules into the physical reader. (but we should still add an assertion that these things match when we use a cached physical reader).

This is an existing issue w.r.t SmallestSeqNum/LargestSeqNum below.

Sounds like we can't assume all virtual SSTs will have the same suffix replacement rules (we would like these to be plumbed together). Maybe we can pass these rules only when we create iterators (from VirtualReader)?

@msbutler msbutler self-assigned this Feb 22, 2024
@msbutler
Copy link
Contributor

moving pr to #3350

@msbutler msbutler closed this Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants