-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compaction: elision-only compactions for tables with only range keys #1759
Conversation
Noticed this issue while rebasing #1750 to pick up the compaction support on master. Splitting it out to avoid additional noise on that PR. This would need to land first. Another approach I played around with was updating Another idea was to only use range key dels in the heuristic (i.e. ignore tables with range key unsets that could potentially be elided). I think in this case a range key unset could fall all the way down to L6 and never be deleted, under the right circumstances. Open to other ideas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
Reviewed all commit messages.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)
compaction_picker.go
line 1249 at r1 (raw file):
// dropped by tombstones and duplicated user keys. See #847. if f.Stats.RangeDeletionsBytesEstimate*10 < f.Size && f.Stats.NumDeletions*10 < f.Stats.NumEntries {
wdyt about about making this <=
instead, with a comment explaining that NumEntries
may be zero if the table only contains range keys?
If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
Reviewed all commit messages.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)
compaction_picker.go
line 1249 at r1 (raw file):
// dropped by tombstones and duplicated user keys. See #847. if f.Stats.RangeDeletionsBytesEstimate*10 < f.Size && f.Stats.NumDeletions*10 < f.Stats.NumEntries {
wdyt about about making this <=
instead, with a comment explaining that NumEntries
may be zero if the table only contains range keys?
If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.
Currently, a table is marked as eligible for elision-only compaction by the `elisionOnlyAnnotator` under the following circumstances: - the table's range deletion estimate is greater than or equal to 10% of the total table size, OR - the number of deletions is greater than or equal to 10% of the table's total point key entries. If a table contains only range keys, the second predicate is true (given that `0 >= 0`), scheduling an elision-only compaction. Howeve, if the table contains only range key-sets, such keys cannot be elided, and the compaction picker will continue to schedule the table for elision, without effect. This can tie up compaction slots. While it is _technically_ possible that a table with containing exclusively range keys, but no range key sets _could_ be eligible for an elision-only compaction (i.e. if there are no spans underneath, or snapshots preventing the elision, etc.), the utility of such a compaction is minimal, given that a compaction into a table containing a few range keys would be inexpensive. Tweak the elision-only compaction heuristics to skip elision-only compactions of tables that contain exclusively range keys.
2d7219a
to
1022ff8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)
compaction_picker.go
line 1249 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
wdyt about about making this
<=
instead, with a comment explaining thatNumEntries
may be zero if the table only contains range keys?If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.
SGTM. Updated the heuristic to <=
and dropped in a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
Reviewed 2 of 13 files at r1, 2 of 11 files at r2.
Reviewable status: 4 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)
TFTR! |
Currently, a table is marked as eligible for elision-only compaction by
the
elisionOnlyAnnotator
under the following circumstances:the total table size, OR
total point key entries.
If a table contains only range keys, the second predicate is true (given
that
0 >= 0
). If the table contains a range key delete or a range keyunset, an elision-only compaction is possible (i.e. if there are no
spans underneath, or snapshots preventing the elision, etc.).
However, if the table contains just range key sets, the table is always
marked for elision, but an elision-only compaction has no effect. The
same elision-only compaction will continue to be scheduled.
Update the elision-only compaction heuristics to only schedule such
compactions when there is at least one range deletion, range key
deletion, or range key unset. This requires storing an additional uint
on the table stats, which allows determining whether a table contains
only range key sets.