Autoscaling proactive trigger on low watermark #78941

henningandersen · 2021-10-11T19:20:22Z

Add test that we trigger a proactive scale up when low watermark is
exceeded.

Add test that we trigger a proactive scale up when low watermark is exceeded.

elasticmachine · 2021-10-11T19:20:25Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen · 2021-10-12T05:15:44Z

...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java

@@ -576,6 +576,12 @@ private SingleForecast forecast(IndexAbstraction.DataStream stream, long forecas
            for (int i = 0; i < numberNewIndices; ++i) {
                final String uuid = UUIDs.randomBase64UUID();
                dataStream = dataStream.rollover(state.metadata(), uuid);
+
+                // this unintentionally copies the in-sync allocation ids too. This has the fortunate effect of these indices


A prototype for a fix has been made, but since the original version already triggers at low watermark, adding it will mostly be a refinement. I will follow up on that subsequently.

DaveCTurner

Left a couple of small comments following our discussion.

Also nit: we use this idiom in ProactiveStorageDecider#testScaleUp

        for (int i = 0; i < between(1, 5); ++i) {

The probability of actually getting 5 iterations is very small (0.032%) - do we really want such a skewed distribution? If not, better to count down from a randomly-chosen start point instead; if so, a comment that it's not an accident would help.

...erTest/java/org/elasticsearch/xpack/autoscaling/storage/AutoscalingStorageIntegTestCase.java

DaveCTurner · 2021-10-14T11:45:37Z

...internalClusterTest/java/org/elasticsearch/xpack/autoscaling/storage/ProactiveStorageIT.java

+        // set and therefore allocating these do not skip the low watermark check in the disk threshold decider.
+        // Fixing this simulation should be done as a separate effort, but we should still ensure that the low watermark is in effect
+        // at least when replicas are involved.
+        long enoughSpace = used + LOW_WATERMARK_BYTES - 1;


We discussed some randomisation in this area to clarify that it doesn't matter where we are between the low & high watermarks:

Suggested change

long enoughSpace = used + LOW_WATERMARK_BYTES - 1;

long enoughSpace = used + randomLongBetween(WATERMARK_BYTES + 1, LOW_WATERMARK_BYTES - 1);

(maybe one of those 1s is unnecessary too)

We also discussed making it even stronger, since (IIUC) we should scale up if we're within min(shard sizes) of the low watermark too.

We can use HIGH_WATERMARK directly, so added this randomization.

The second part turned out not working currently and will require the allocation work instead, since DiskThresholdDecider allows allocating the first shard that brings it above low watermark (does not consider the shard size when checking low watermark). So left this to address in that follow-up.

…ow_watermark

DaveCTurner

LGTM

Add test that we trigger a proactive scale up when low watermark is exceeded.

Autoscaling proactive trigger on low watermark

d26d002

Add test that we trigger a proactive scale up when low watermark is exceeded.

henningandersen added >test Issues or PRs that are addressing/adding tests v8.0.0 :Distributed Coordination/Autoscaling v7.16.0 labels Oct 11, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 11, 2021

henningandersen commented Oct 12, 2021

View reviewed changes

henningandersen requested a review from DaveCTurner October 12, 2021 05:16

DaveCTurner reviewed Oct 14, 2021

View reviewed changes

henningandersen added 2 commits October 14, 2021 13:46

Merge remote-tracking branch 'origin/master' into test_proactive_it_l…

c637b19

…ow_watermark

Review fixes

3351c01

henningandersen requested a review from DaveCTurner October 14, 2021 12:26

DaveCTurner approved these changes Oct 14, 2021

View reviewed changes

henningandersen merged commit 5c386e1 into elastic:master Oct 15, 2021

henningandersen added a commit that referenced this pull request Oct 15, 2021

Autoscaling proactive trigger on low watermark (#78941)

c5b2ce6

Add test that we trigger a proactive scale up when low watermark is exceeded.

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaling proactive trigger on low watermark #78941

Autoscaling proactive trigger on low watermark #78941

henningandersen commented Oct 11, 2021

elasticmachine commented Oct 11, 2021

henningandersen Oct 12, 2021

DaveCTurner left a comment

DaveCTurner Oct 14, 2021

henningandersen Oct 14, 2021

DaveCTurner left a comment

	long enoughSpace = used + LOW_WATERMARK_BYTES - 1;
	long enoughSpace = used + randomLongBetween(WATERMARK_BYTES + 1, LOW_WATERMARK_BYTES - 1);

Autoscaling proactive trigger on low watermark #78941

Autoscaling proactive trigger on low watermark #78941

Conversation

henningandersen commented Oct 11, 2021

elasticmachine commented Oct 11, 2021

henningandersen Oct 12, 2021

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Oct 14, 2021

Choose a reason for hiding this comment

henningandersen Oct 14, 2021

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment