Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_internal: don't visit empty external files #3384

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

stevendanna
Copy link
Contributor

When truncating external files to our scan bounds, we may find that the file has Start == End. Other validation in ingest asserts these are invalid bounds.

@stevendanna stevendanna marked this pull request as ready for review March 7, 2024 03:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @stevendanna)


scan_internal.go line 699 at r1 (raw file):

		for level := firstLevelWithRemote; level < numLevels; level++ {
			files := current.Levels[level].Iter()
			for f := files.SeekGE(cmp, lower); f != nil && cmp(f.Smallest.UserKey, upper) < 0; f = files.Next() {

Shouldn't this guarantee the empty sst doesn't happen? Maybe it's just a corner case where SeekGE returns a file that ends exactly at lower? Feels like we should check that here and continue instead (and replace the check you added with an assertion)

@stevendanna
Copy link
Contributor Author

@RaduBerinde Yes, I believe that is exactly what is happening. The file ends exactly at lower and SeekGE returns it. We try to scan from something like /Table/104/, Table/105 and end up picking up a file for /Table/103,/Table/104.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should check for that condition first thing in the loop and continue instead. E.g.

		// SeekGE can return a file that ends at lower.
		if f.Largest.IsExclusiveSentinel() && cmp(f.Largest.UserKey, lower) == 0 {
			continue
		}

And we can change the new check you added into an assertion.

@jbowens Is this actually a bug in LevelIterator.SeekGE? Should it do what's below instead?

        m := i.seek(func(m *FileMetadata) bool {
-               return cmp(m.Largest.UserKey, userKey) >= 0
+               c := cmp(m.Largest.UserKey, userKey)
+               return c > 0 || (c == 0 && !m.Largest.IsExclusiveSentinel())

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @stevendanna)


scan_internal.go line 466 at r1 (raw file):

	cmp := d.cmp
	sst := &ExternalFile{}
	sst.cloneFromFileMeta(file)

NYC but this method is only used here and it's far away from this code. It just makes this code harder to read. I would just inline it and simplify the code (we can set most fields directly and then we can set Start/End to either key).

Copy link
Contributor Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Will push this shortly.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @RaduBerinde)


scan_internal.go line 466 at r1 (raw file):

Previously, RaduBerinde wrote…

NYC but this method is only used here and it's far away from this code. It just makes this code harder to read. I would just inline it and simplify the code (we can set most fields directly and then we can set Start/End to either key).

:D You are right. I put this in in a previous commit wanting symmetry with SharedSST.cloneFileFromMeta but I also dislike how far away it is.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a bug, but maybe adjusting the semantics can clean up some code. We'd need to audit all the other callers. Today this SeekGE is only a seek on the largest user key and doesn't imply anything about inclusivity or exclusivity.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @RaduBerinde, and @stevendanna)

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, it looks like when the key type filtering was added, it implemented the semantics you're suggesting when the filter is not KeyTypePointAndRange?

	if i.filter != KeyTypePointAndRange && m != nil {
		b, ok := m.LargestBound(i.filter)
		if !ok {
			m = i.Next()
		} else if c := cmp(b.UserKey, userKey); c < 0 || c == 0 && b.IsExclusiveSentinel() {
			// This file does not contain any keys of the type ≥ lower. It
			// should be filtered, even though it does contain point keys.
			m = i.Next()
		}
	}

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @RaduBerinde, and @stevendanna)

Copy link
Contributor Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @RaduBerinde)


scan_internal.go line 471 at r1 (raw file):

	sst.Locator = objMeta.Remote.Locator
	needsLowerTruncate := cmp(lower, file.Smallest.UserKey) > 0
	needsUpperTruncate := cmp(upper, file.Largest.UserKey) < 0 || (cmp(upper, file.Largest.UserKey) == 0 && !file.Largest.IsExclusiveSentinel())

Actually, isn't this wrong as well. This ! seems wrong.

Copy link
Contributor Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented @RaduBerinde's suggestion. I wasn't sure whether y'all prefer error returns with AssertionFailedf or a panic for this sort of thing.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @RaduBerinde)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 5 unresolved discussions (waiting on @itsbilal and @stevendanna)


scan_internal.go line 471 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Actually, isn't this wrong as well. This ! seems wrong.

Looks like it's wrong but inconsequential.. The user key is the same so it doesn't matter which one we choose.


scan_internal.go line 473 at r2 (raw file):

		HasRangeKey:     file.HasRangeKeys,
		Size:            file.Size,
		SyntheticSuffix: append([]byte(nil), file.SyntheticSuffix...),

[nit] use slices.Clone (same for the bound keys)


scan_internal.go line 495 at r2 (raw file):

	}

	if cmp(sst.Bounds.Start, sst.Bounds.End) == 0 {

<= 0

When truncating external files to our scan bounds, we may find that
the file has Start == End. Other validation in ingest asserts these
are invalid bounds.
@stevendanna stevendanna force-pushed the scan-internal-truncate-bounds branch from a7e634a to ca67095 Compare March 8, 2024 17:50
Copy link
Contributor Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @RaduBerinde)


scan_internal.go line 473 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] use slices.Clone (same for the bound keys)

Nice, I really need to internalize the new slices package. makes things much better.


scan_internal.go line 495 at r2 (raw file):

Previously, RaduBerinde wrote…

<= 0

I think you meant >=. Also changed the message a bit to line up with some others I found.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Good call on returning it as an error, it is preferable (we'd rather have an operation fail than crash the node).

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @stevendanna)

@stevendanna stevendanna merged commit 33f8f04 into master Mar 8, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants