Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compaction: elision-only compactions for tables with only range keys #1759

Merged
merged 1 commit into from
Jun 14, 2022

Conversation

nicktrav
Copy link
Contributor

Currently, a table is marked as eligible for elision-only compaction by
the elisionOnlyAnnotator under the following circumstances:

  • the table's range deletion estimate is greater than or equal to 10% of
    the total table size, OR
  • the number of deletions is greater than or equal to 10% of the table's
    total point key entries.

If a table contains only range keys, the second predicate is true (given
that 0 >= 0). If the table contains a range key delete or a range key
unset, an elision-only compaction is possible (i.e. if there are no
spans underneath, or snapshots preventing the elision, etc.).

However, if the table contains just range key sets, the table is always
marked for elision, but an elision-only compaction has no effect. The
same elision-only compaction will continue to be scheduled.

Update the elision-only compaction heuristics to only schedule such
compactions when there is at least one range deletion, range key
deletion, or range key unset. This requires storing an additional uint
on the table stats, which allows determining whether a table contains
only range key sets.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nicktrav nicktrav marked this pull request as ready for review June 11, 2022 19:42
@nicktrav nicktrav requested review from itsbilal and jbowens June 11, 2022 19:42
@nicktrav
Copy link
Contributor Author

Noticed this issue while rebasing #1750 to pick up the compaction support on master. Splitting it out to avoid additional noise on that PR. This would need to land first.

Another approach I played around with was updating TableStats.NumEntries to take into account range keys. That seems to break a bunch of things that rely on this being a point key only number (i.e. disk estimates currently only take into account point keys, and this can result in a div-by-zero panic if the table contains only point keys). In this case we'd still need a way of disambiguating range-key-set-only tables, which shouldn't be elided.

Another idea was to only use range key dels in the heuristic (i.e. ignore tables with range key unsets that could potentially be elided). I think in this case a range key unset could fall all the way down to L6 and never be deleted, under the right circumstances.

Open to other ideas.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

Reviewed all commit messages.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)


compaction_picker.go line 1249 at r1 (raw file):

	// dropped by tombstones and duplicated user keys. See #847.
	if f.Stats.RangeDeletionsBytesEstimate*10 < f.Size &&
		f.Stats.NumDeletions*10 < f.Stats.NumEntries {

wdyt about about making this <= instead, with a comment explaining that NumEntries may be zero if the table only contains range keys?

If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

Reviewed all commit messages.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)


compaction_picker.go line 1249 at r1 (raw file):

	// dropped by tombstones and duplicated user keys. See #847.
	if f.Stats.RangeDeletionsBytesEstimate*10 < f.Size &&
		f.Stats.NumDeletions*10 < f.Stats.NumEntries {

wdyt about about making this <= instead, with a comment explaining that NumEntries may be zero if the table only contains range keys?

If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.

Currently, a table is marked as eligible for elision-only compaction by
the `elisionOnlyAnnotator` under the following circumstances:

- the table's range deletion estimate is greater than or equal to 10% of
  the total table size, OR
- the number of deletions is greater than or equal to 10% of the table's
  total point key entries.

If a table contains only range keys, the second predicate is true (given
that `0 >= 0`), scheduling an elision-only compaction. Howeve, if the
table contains only range key-sets, such keys cannot be elided, and the
compaction picker will continue to schedule the table for elision,
without effect. This can tie up compaction slots.

While it is _technically_ possible that a table with containing
exclusively range keys, but no range key sets _could_ be eligible for an
elision-only compaction (i.e. if there are no spans underneath, or
snapshots preventing the elision, etc.), the utility of such a
compaction is minimal, given that a compaction into a table containing a
few range keys would be inexpensive.

Tweak the elision-only compaction heuristics to skip elision-only
compactions of tables that contain exclusively range keys.
@nicktrav nicktrav force-pushed the nickt.range-key-elision branch from 2d7219a to 1022ff8 Compare June 13, 2022 15:38
Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)


compaction_picker.go line 1249 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

wdyt about about making this <= instead, with a comment explaining that NumEntries may be zero if the table only contains range keys?

If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.

SGTM. Updated the heuristic to <= and dropped in a TODO.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Nice catch!

Reviewed 2 of 13 files at r1, 2 of 11 files at r2.
Reviewable status: 4 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)

@nicktrav
Copy link
Contributor Author

TFTR!

@nicktrav nicktrav merged commit ef1ca57 into cockroachdb:master Jun 14, 2022
@nicktrav nicktrav deleted the nickt.range-key-elision branch June 14, 2022 14:45
@nicktrav nicktrav mentioned this pull request Jun 16, 2022
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants