Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling proactive trigger on low watermark #78941

Merged

Conversation

henningandersen
Copy link
Contributor

Add test that we trigger a proactive scale up when low watermark is
exceeded.

Add test that we trigger a proactive scale up when low watermark is
exceeded.
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 11, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@@ -576,6 +576,12 @@ private SingleForecast forecast(IndexAbstraction.DataStream stream, long forecas
for (int i = 0; i < numberNewIndices; ++i) {
final String uuid = UUIDs.randomBase64UUID();
dataStream = dataStream.rollover(state.metadata(), uuid);

// this unintentionally copies the in-sync allocation ids too. This has the fortunate effect of these indices
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A prototype for a fix has been made, but since the original version already triggers at low watermark, adding it will mostly be a refinement. I will follow up on that subsequently.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of small comments following our discussion.

Also nit: we use this idiom in ProactiveStorageDecider#testScaleUp

        for (int i = 0; i < between(1, 5); ++i) {

The probability of actually getting 5 iterations is very small (0.032%) - do we really want such a skewed distribution? If not, better to count down from a randomly-chosen start point instead; if so, a comment that it's not an accident would help.

// set and therefore allocating these do not skip the low watermark check in the disk threshold decider.
// Fixing this simulation should be done as a separate effort, but we should still ensure that the low watermark is in effect
// at least when replicas are involved.
long enoughSpace = used + LOW_WATERMARK_BYTES - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed some randomisation in this area to clarify that it doesn't matter where we are between the low & high watermarks:

Suggested change
long enoughSpace = used + LOW_WATERMARK_BYTES - 1;
long enoughSpace = used + randomLongBetween(WATERMARK_BYTES + 1, LOW_WATERMARK_BYTES - 1);

(maybe one of those 1s is unnecessary too)

We also discussed making it even stronger, since (IIUC) we should scale up if we're within min(shard sizes) of the low watermark too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use HIGH_WATERMARK directly, so added this randomization.

The second part turned out not working currently and will require the allocation work instead, since DiskThresholdDecider allows allocating the first shard that brings it above low watermark (does not consider the shard size when checking low watermark). So left this to address in that follow-up.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@henningandersen henningandersen merged commit 5c386e1 into elastic:master Oct 15, 2021
henningandersen added a commit that referenced this pull request Oct 15, 2021
Add test that we trigger a proactive scale up when low watermark is
exceeded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Autoscaling Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants