-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: limit poor LSM health degradation via last line of defense protections in Pebble #79956
Comments
The current comment mentions that the threshold is used relative to L0 sublevel count only when the `Experimental.L0SublevelCompactions` option is enabled. As of 32a5180, the L0 sublevel count is always used. Update the comment. Touches cockroachdb/cockroach#79956.
The current comment mentions that the threshold is used relative to L0 sublevel count only when the `Experimental.L0SublevelCompactions` option is enabled. As of 32a5180, the L0 sublevel count is always used. Update the comment. Touches cockroachdb/cockroach#79956.
Have we considered being more targeted and failing/pausing liveness heartbeats under these stop conditions? We already do this: cockroach/pkg/kv/kvserver/liveness/liveness.go Lines 1257 to 1265 in 0b7ba56
and could strengthen it by more aggressively skipping heartbeats when pebble indicates overload. My default opinion is that engine backpressure is not a good idea. It is much more likely than targeted backpressure to have adverse effects across the cluster. For #79215, we are also discussing removing below-raft throttling that exists today in favor of better mechanisms. I understand the desire to have a catch-all "fallback" but is it clear that the resulting behavior is better? We have a history of putting in mitigations that we don't understand well, and would like to make sure we're not repeating that here. Interested in any conversations that I've missed & experiments that were or could be run. Assuming nodes reliably fail their heartbeats while in an L0-stop regime, the main source of incoming are raft messages (anything else?). I would be open to exploring ways to delay raft handling (dropping incoming messages is the "easiest" thing we could do, with a bit more elbow grease we could probably properly "pause" replication). Or put differently, if we solve the "follower writes" problem and "backpressure" heartbeats, do we have the catch-all that we're after? |
I was thinking about our I think it's premature to consider hard engine-wide limits before we've made I know we're already planning on adaptively tuning Linking @sumeerbhola's a lot more nuanced thinking on the compaction concurrency limits. |
We have marked this issue as stale because it has been inactive for |
Currently, Cockroach relies heavily on Admission Control (AC) to protect the store on a node from falling into a "poor health" regime (most typically "LSM inversion", created when are large amount of data is resident in L0, across multiple sublevels, contributing to high read amplification). AC typically "knows more" about operations being executed on a store and can therefore make better decisions on what to allow / throttle than, say, Pebble itself.
However, there exist gaps in AC today that can result in poor LSM health (sometimes significant, as evidenced by a recent example, #79115). These poor health situations are typically difficult to recover from without invasive operator intervention (taking the affected node offline, performing an often lengthy manual compaction; or decommissioning the node entirely).
As a "last line of defense", Pebble could back-pressure or stall writes entirely to prevent a further degradation of health.
Currently, a hard limit on L0 sublevels exists, defaulting to 12 in Pebble, and 1,000 in Cockroach. For comparison, AC currently considers a store overloaded with an 20 L0 sublevels and 1,000 L0 files.
It should be noted that employing hard limits in Pebble as a last line of defense comes with some risks - writes below raft would be halted, and the failures modes would be less graceful (compared to AC, which still allows some progress in a store overload situation).
Any tuning of hard limits in Pebble should come with associated dynamically adjustable tuning knobs, allowing an operator to raise (or lower) the limits based on the circumstances.
The following have some useful background reading on the topic:
Jira issue: CRDB-15710
The text was updated successfully, but these errors were encountered: