Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: shared ingestion fixes #3074

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

itsbilal
Copy link
Contributor

@itsbilal itsbilal commented Nov 15, 2023

Contains 3 commits, each of which contains some fixes around shared/foreign sstables.

*: set hideObsoletePoints=true for shared foreign sstables

Previously, we weren't setting hideObsoletePoints to true for
compaction iters created out of shared foreign sstables. This
change addresses that by passing that state through virtualState
in VirtualReader through to the old Reader.

Unblocks #3044.

ingest: don't call checkVirtualBounds() in ingestion

Previously, we were calling checkVirtualBounds on FileMetadatas
before they were installed. This would lead to panics in invariant
builds as we would try to load an uninstalled FileMetadata
from the table cache.

This change removes the checkVirtualBounds call, making it a test-only
thing. Adding a check after version installation also doesn't guarantee
that refs will be nonzero, as a compaction immediately after ingestApply
is possible.

Only an issue with invariants builds. Unblocks #3044.

scan_internal: Sort keys by kind before calling visitor

Previously, we were passing range keys that were sorted by trailer
descending right from ScanInternal. However, the caller isn't
interested in seqnum-based sorting, only kind-based trailer sorting.
This change zeroes seqnums before sorting the keys.

Unblocks #3044.

@itsbilal itsbilal requested review from sumeerbhola and a team November 15, 2023 17:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @sumeerbhola)


table_cache.go line 204 at r1 (raw file):

}

func createCommonReader(

It would be useful to document what isForeign does exactly here.

It's kind of unfortunate to add it to this method which is supposed to hide away the virtual vs non-virtual differences.. Maybe this flag belongs in FileMetadata, alongside Virtual?

Is there a description of hideObsoletePoints somewhere?


data_test.go line 1158 at r1 (raw file):

	props := r.Properties.String()
	if m != nil && m.Virtual {
		v = sstable.MakeVirtualReader(r, m.VirtualMeta(), false /* isForeign */)

How do we know it's not foreign? I assume it doesn't matter here, but then the flag shouldn't be isForeign, it should be more descriptive of what it actually does.

Previously, we weren't setting hideObsoletePoints to true for
compaction iters created out of shared foreign sstables. This
change addresses that by passing that state through virtualState
in VirtualReader through to the old Reader.

Unblocks cockroachdb#3044.
Previously, we were calling checkVirtualBounds on FileMetadatas
before they were installed. This would lead to panics in invariant
builds as we would try to load an uninstalled FileMetadata
from the table cache.

This change removes the checkVirtualBounds call, making it a test-only
thing. Adding a check after version installation also doesn't guarantee
that refs will be nonzero, as a compaction immediately after ingestApply
is possible.

Only an issue with invariants builds. Unblocks cockroachdb#3044.
Previously, we were passing range keys that were sorted by trailer
descending right from ScanInternal. However, the caller isn't
interested in seqnum-based sorting, only kind-based trailer sorting.
This change zeroes seqnums before sorting the keys.

Unblocks cockroachdb#3044.
@itsbilal itsbilal force-pushed the shared-ingest-compaction-fix branch from 02c9ac6 to a44b6d2 Compare November 15, 2023 18:00
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde and @sumeerbhola)


table_cache.go line 204 at r1 (raw file):

Previously, RaduBerinde wrote…

It would be useful to document what isForeign does exactly here.

It's kind of unfortunate to add it to this method which is supposed to hide away the virtual vs non-virtual differences.. Maybe this flag belongs in FileMetadata, alongside Virtual?

Is there a description of hideObsoletePoints somewhere?

Added a comment. And I do have a bigger refactor in mind where commonReader just becomes an interface, and this method encapsulates the isForeign determination logic instead of having the caller do it. I'll try to do that later; this is a minimal change to fix the bugs.

And since IsForeign is a state variable that's derived from stuff that's already stored in the catalog, I think it's fair to just leave it as a calculated field instead of storing it in the FileMetadata. Though I'm open to either argument. For the sake of making this backportable, we should probably just pass in isForeign?

hideObsoletePoints is sort of defined in reader.go around the NewIter* methods, but not in great detail.


data_test.go line 1158 at r1 (raw file):

Previously, RaduBerinde wrote…

How do we know it's not foreign? I assume it doesn't matter here, but then the flag shouldn't be isForeign, it should be more descriptive of what it actually does.

Yeah it's just a test helper function, and these tests don't create foreign SSTs, only TestIngestShared and TestScanInternal etc do.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @sumeerbhola)


table_cache.go line 204 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Added a comment. And I do have a bigger refactor in mind where commonReader just becomes an interface, and this method encapsulates the isForeign determination logic instead of having the caller do it. I'll try to do that later; this is a minimal change to fix the bugs.

And since IsForeign is a state variable that's derived from stuff that's already stored in the catalog, I think it's fair to just leave it as a calculated field instead of storing it in the FileMetadata. Though I'm open to either argument. For the sake of making this backportable, we should probably just pass in isForeign?

hideObsoletePoints is sort of defined in reader.go around the NewIter* methods, but not in great detail.

I guess this is where I'm coming from: it doesn't feel right that we should tie semantics to whether a file is external or not, even though these are correlated in the current usecases.. It should be something that is provided separately when we attach the file. Maybe one day we will want to attach an external file as a "regular" SST - e.g. starting a "cloned cluster" from saved object files.

This may or may not make sense because I still don't understand the hideObsoletePoints semantics well.

Anyway this PR is ok to merge for now, just trying to figure out the desired long term state.

Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde and @sumeerbhola)


table_cache.go line 204 at r1 (raw file):

Previously, RaduBerinde wrote…

I guess this is where I'm coming from: it doesn't feel right that we should tie semantics to whether a file is external or not, even though these are correlated in the current usecases.. It should be something that is provided separately when we attach the file. Maybe one day we will want to attach an external file as a "regular" SST - e.g. starting a "cloned cluster" from saved object files.

This may or may not make sense because I still don't understand the hideObsoletePoints semantics well.

Anyway this PR is ok to merge for now, just trying to figure out the desired long term state.

Yeah I can see that. Currently we've tied "should we only read the newest internal key for each user key" (which is what hideObsoletePoints really does), to "is this file shared foreign" (which btw is not the same as external), and does make sense to have them as two separate things. It also makes sense to have a FileMetadata field for this, seeing as that's usually the place to interpret how to read a file, while the catalog ObjectMetadata mostly speaks about where the file is.

TFTR!

@itsbilal itsbilal merged commit 038559e into cockroachdb:master Nov 15, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants