-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit default disk watermark on different data tiers #81406
Comments
Pinging @elastic/es-data-management (Team:Data Management) |
One thing that would make this difficult is that these are dynamic settings, and we currently don't have the Settings functionality to be able to specify setting a particular value on a subset of nodes. This would mean that we could have separate defaults, but if the user ever dynamically changed one of the settings, it would change for all tiers. I'm adding the core/infra label for this, since they'd need to help us with the Settings infrastructure if we wanted to support something like this fully. |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
There was a discussion that this would be an operator-only setting on the cloud once the default settings are sensible. |
That helps a bit. I think it'd still be good to think about where we want to go with tier-specific settings in the future, both for the purpose of on-prem users, but also for other settings (that may be non-operator-only) that we want to set to different values (for example, we already do this with a recovery setting: #68480) |
Pinging @elastic/es-distributed (Team:Distributed) |
I wonder if we really need more control here, as opposed to changing how Elasticsearch computes its disk watermarks to better utilise larger nodes. Today ES (roughly) aims to keep 10% of disk free on each node, but if we say changed this to target |
I agree with David here, doing similarly to what we did for frozen tier on all tiers would in my mind effectively solve this problem. Making it tier specific is not enough/ideal during a rolling up-size anyway. |
Supporting a rudimentary syntax of either: |
From conversing with @henningandersen , we could use 20GB headroom for flood_stage, 100GB for high, and 150GB for low watermarks. |
@henningandersen , as we said, we could add a max_headroom setting (similar to the existing
|
Team, as mentioned in my comment above, I plan to start seeing how to implement the |
In implementing this, I am thinking of adopting @henningandersen 's new RelativeByteSizeValue.java. It was only introduced for the frozen-tier flood stage watermark and headroom, but I believe it would make sense to adopt it in this issue for all disk watermarks and introduce the max headrooms in a similar way. I believe this will simplify the code (which now tends to separately check for the thresholds in either exact bytes or percentages) without breaking backwards compatibility. Please raise your voice if you would have any concerns over this. |
cc'ing also @gmarouli who I saw authored Add disk thresholds in the cluster state recently and could provide feedback as well. |
Hi! I have been dealing with this issue for some days now, and I have been doing several changes across DiskThresholdSettings.java, DiskThresholdMonitor.java, DiskThresholdDecider.java, HealthMetadata.java, and relevant tests. Just to double confirm I am going the correct way, I would like at least one more opinion on this one. Since a couple of people are away, @fcofdez , would you mind taking a look at my comments above and see if I'm going about the right way? In short:
Feedback is welcome |
I'll take a look into this next Monday @kingherc if that's ok |
I think this makes sense.
Maybe we can leave this for a follow-up PR? Since I think this is just a refactoring to clean up some old code, right?
I would expect to keep the old behaviour when we don't specify the new
Yes, that's confusing. But we need to behave correctly in that case, i.e: use the frozen cc @kingherc |
Awesome, thanks for the pointers @fcofdez ! I think the most important thing was to see I'm on a correct path which it seems so from your answer, so I will continue on this path.
I did not go that way because converting them to RelativeByteSizeValue would simplify the code (it was already done for the frozen max_headroom setting, so I could adopt similar code). But it turns out I need to touch a lot of test code to make sure everything works appropriately, so if I see it needs a lot more work, I may then do it in a separate PR. I also believe I am not introducing backwards incompatible changes/behaviour -- as I believe if the new max_headroom settings are not set, the old ones will behave as they were. For the frozen setting, I found that I can leave it as it is. That way, we do not need to deprecate anything or make something backwards incompatible. However, I do agree that it may be confusing, and I would welcome especially @henningandersen (who introduced the frozen setting that I'm mimicking) to comment on whether we should do anything. |
Introduce max headroom settings for the low, high, and flood disk watermark stages, similar to the existing max headroom setting for the flood stage of the frozen tier. Also, convert the disk watermarks to RelativeByteSizeValue, similar to the existing setting for the flood stage of the frozen tier. Introduce new max headrooms in HealthMetadata and in ReactiveStorageDeciderService. Add multiple tests in DiskThresholdDeciderUnitTests, DiskThresholdDeciderTests and DiskThresholdMonitorTests. Fixes elastic#81406
We cannot deprecate it since it has a lower default value. For frozen there is no merging and we can go much closer to the limit.
👍 |
Today, different data tiers all have the same
cluster.routing.allocation.disk.watermark.low|high|flood_stage
settings. These settings by default are relative values means that for a larger disk, the default value can introduce some level of wastage.Do we want to revisit the default value for disk watermark especially for different data tiers, there might be a different level of space requirement?
The text was updated successfully, but these errors were encountered: