-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scan_internal: don't visit empty external files #3384
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @stevendanna)
scan_internal.go
line 699 at r1 (raw file):
for level := firstLevelWithRemote; level < numLevels; level++ { files := current.Levels[level].Iter() for f := files.SeekGE(cmp, lower); f != nil && cmp(f.Smallest.UserKey, upper) < 0; f = files.Next() {
Shouldn't this guarantee the empty sst doesn't happen? Maybe it's just a corner case where SeekGE
returns a file that ends exactly at lower? Feels like we should check that here and continue
instead (and replace the check you added with an assertion)
@RaduBerinde Yes, I believe that is exactly what is happening. The file ends exactly at lower and SeekGE returns it. We try to scan from something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should check for that condition first thing in the loop and continue
instead. E.g.
// SeekGE can return a file that ends at lower.
if f.Largest.IsExclusiveSentinel() && cmp(f.Largest.UserKey, lower) == 0 {
continue
}
And we can change the new check you added into an assertion.
@jbowens Is this actually a bug in LevelIterator.SeekGE
? Should it do what's below instead?
m := i.seek(func(m *FileMetadata) bool {
- return cmp(m.Largest.UserKey, userKey) >= 0
+ c := cmp(m.Largest.UserKey, userKey)
+ return c > 0 || (c == 0 && !m.Largest.IsExclusiveSentinel())
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @stevendanna)
scan_internal.go
line 466 at r1 (raw file):
cmp := d.cmp sst := &ExternalFile{} sst.cloneFromFileMeta(file)
NYC but this method is only used here and it's far away from this code. It just makes this code harder to read. I would just inline it and simplify the code (we can set most fields directly and then we can set Start/End to either key).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Will push this shortly.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @RaduBerinde)
scan_internal.go
line 466 at r1 (raw file):
Previously, RaduBerinde wrote…
NYC but this method is only used here and it's far away from this code. It just makes this code harder to read. I would just inline it and simplify the code (we can set most fields directly and then we can set Start/End to either key).
:D You are right. I put this in in a previous commit wanting symmetry with SharedSST.cloneFileFromMeta but I also dislike how far away it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a bug, but maybe adjusting the semantics can clean up some code. We'd need to audit all the other callers. Today this SeekGE is only a seek on the largest user key and doesn't imply anything about inclusivity or exclusivity.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @RaduBerinde, and @stevendanna)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, it looks like when the key type filtering was added, it implemented the semantics you're suggesting when the filter is not KeyTypePointAndRange
?
if i.filter != KeyTypePointAndRange && m != nil {
b, ok := m.LargestBound(i.filter)
if !ok {
m = i.Next()
} else if c := cmp(b.UserKey, userKey); c < 0 || c == 0 && b.IsExclusiveSentinel() {
// This file does not contain any keys of the type ≥ lower. It
// should be filtered, even though it does contain point keys.
m = i.Next()
}
}
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @RaduBerinde, and @stevendanna)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @RaduBerinde)
scan_internal.go
line 471 at r1 (raw file):
sst.Locator = objMeta.Remote.Locator needsLowerTruncate := cmp(lower, file.Smallest.UserKey) > 0 needsUpperTruncate := cmp(upper, file.Largest.UserKey) < 0 || (cmp(upper, file.Largest.UserKey) == 0 && !file.Largest.IsExclusiveSentinel())
Actually, isn't this wrong as well. This !
seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented @RaduBerinde's suggestion. I wasn't sure whether y'all prefer error returns with AssertionFailedf or a panic for this sort of thing.
Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @RaduBerinde)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 5 unresolved discussions (waiting on @itsbilal and @stevendanna)
scan_internal.go
line 471 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Actually, isn't this wrong as well. This
!
seems wrong.
Looks like it's wrong but inconsequential.. The user key is the same so it doesn't matter which one we choose.
scan_internal.go
line 473 at r2 (raw file):
HasRangeKey: file.HasRangeKeys, Size: file.Size, SyntheticSuffix: append([]byte(nil), file.SyntheticSuffix...),
[nit] use slices.Clone (same for the bound keys)
scan_internal.go
line 495 at r2 (raw file):
} if cmp(sst.Bounds.Start, sst.Bounds.End) == 0 {
<= 0
When truncating external files to our scan bounds, we may find that the file has Start == End. Other validation in ingest asserts these are invalid bounds.
a7e634a
to
ca67095
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @RaduBerinde)
scan_internal.go
line 473 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] use slices.Clone (same for the bound keys)
Nice, I really need to internalize the new slices package. makes things much better.
scan_internal.go
line 495 at r2 (raw file):
Previously, RaduBerinde wrote…
<= 0
I think you meant >=
. Also changed the message a bit to line up with some others I found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on returning it as an error, it is preferable (we'd rather have an operation fail than crash the node).
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @stevendanna)
When truncating external files to our scan bounds, we may find that the file has Start == End. Other validation in ingest asserts these are invalid bounds.