Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: snapshot keyspan bounds #1810

Open
jbowens opened this issue Jul 14, 2022 · 4 comments
Open

perf: snapshot keyspan bounds #1810

jbowens opened this issue Jul 14, 2022 · 4 comments
Labels
A-storage A-write-amp potential to reduce write amplification C-enhancement New feature or request T-storage

Comments

@jbowens
Copy link
Collaborator

jbowens commented Jul 14, 2022

Currently, a snapshot always applies universally across all keys in the database. In CockroachDB, snapshots are used to preserve state within the context of a single range. An LSM snapshot constructed to read range r1 still prevents removal of obsolete keys in range r2.

We could extend NewSnapshot to allow supplying a predicate p(k)→bool that configures the snapshot to only snapshot keys for which p returns true. During compaction and flushes, when a snapshot appears to produce a new snapshot stripe within the same key, p(k) is consulted and a new stripe is produced only if p(k)→true. This would allow overwritten keys in a non-snapshotted CockroachDB range to be dropped while preserving overwritten keys in a snapshotted CockroachDB range.

There's still the question of what to do when an iterator constructed through Snapshot.NewIter reads a key for which the predicate p(k)→false:

  1. Do nothing—the result of iterating over keys that do not match p(k) is undefined. The caller must be careful to never use iteration results outside the predicate. If the predicate is defined over swaths of the key space, this may be achieved through setting iterator bounds.
  2. Filter—during iteration, before returning, test p(k), skipping the key if p(k)→false.
  3. Read at higher sequence number—when the iterator is constructed from the snapshot, record two sequence numbers: the current visible sequence number on the database, and the snapshot's sequence number. User keys for which p(k)→true are filtered at the snapshot's sequence number. User keys for which p(k)→false are filtered at the database's visible sequence number.

I expect limiting the scope of active snapshots would reduce write amplification, in particular during periods of heavy rebalancing where there are open LSM snapshots and replicas are being simultaneously removed. Replica removal lays down range deletions, but those range deletions are unable to drop the replica's data. Compaction of these range deletions is still prioritized, because wide range deletions force ingested sstables into higher levels. The result is we suffer unnecessary write amplification moving the removed replica's data and the range tombstone into L6.

If we are to tackle this, I think we might want to expose a very limited interface, at least from the CockroachDB pkg/storage package that meets our specific snapshot usages. This can help avoid the possibility of reading unshapshotted keys while under the impression of reading through a consistent snapshot.

The amount of write amplification saved is still unknown. Adding metrics for the size of obsolete keys preserved during compactions (#1204) would help us prioritize.

Jira issue: PEBBLE-127

@sumeerbhola
Copy link
Collaborator

Can we make this less general, and limit predicates to a single key-span, and check that all iterator bounds are limited to that key-span?

@jbowens
Copy link
Collaborator Author

jbowens commented Jul 18, 2022

I think the fragmentation of a range's various key spaces (range-id, range-local, etc) force us to snapshot multiple key spans together.

@sumeerbhola
Copy link
Collaborator

I think the fragmentation of a range's various key spaces (range-id, range-local, etc) force us to snapshot multiple key spans together.

Ah yes. We could still have it be explicitly represented as a set of spans, yes?

@jbowens
Copy link
Collaborator Author

jbowens commented Jul 18, 2022

Yeah, for sure

@jbowens jbowens added the A-write-amp potential to reduce write amplification label Jul 31, 2022
jbowens added a commit to jbowens/pebble that referenced this issue Jan 22, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Jan 22, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Jan 31, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 3, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 3, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 8, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 8, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit that referenced this issue Feb 8, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by #2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots #1810).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage A-write-amp potential to reduce write amplification C-enhancement New feature or request T-storage
Projects
Status: Backlog
Development

No branches or pull requests

2 participants