Forecast write load during rollovers #91425

fcofdez · 2022-11-08T17:51:51Z

No description provided.

elasticsearchmachine · 2022-11-08T17:52:15Z

Hi @fcofdez, I've created a changelog YAML for you.

fcofdez · 2022-11-08T17:53:02Z

server/src/main/java/org/elasticsearch/index/shard/IndexWriteLoadForecast.java

+        out.writeDoubleArray(forecastedShardWriteLoad);
+    }
+
+    public static ClusterState maybeIncludeWriteLoadForecast(


This is the method that computes the forecasted write load for the new write index.

fcofdez · 2022-11-08T17:53:37Z

server/src/main/java/org/elasticsearch/index/shard/IndexWriteLoadForecast.java

+        TimeValue minShardUptime
+    ) {
+        final IndexMetadata writeIndex = clusterState.metadata().getIndexSafe(dataStream.getWriteIndex());
+        if (IndexSettings.FORECAST_WRITE_LOAD_SETTING.get(writeIndex.getSettings()) == false) {


This private setting is set by the new plugin in order to check if the cluster has the proper license to use this feature.

fcofdez · 2022-11-08T17:55:48Z

...n/src/main/java/org/elasticsearch/xpack/writeload/WriteLoadForecastIndexSettingProvider.java

+        Settings allSettings,
+        List<CompressedXContent> combinedTemplateMappings
+    ) {
+        if (dataStreamName != null && metadata.dataStreams().get(dataStreamName) != null && hasValidLicense.getAsBoolean()) {


This is where we decider if we can add the forecasted write load depending on the current license. It's a bit convoluted but I couldn't find a better way to hook into the rollover.

One option is to add a new method to ClusterPlugin that returns a WriteLoadHandler, which can then be implemented in the plugin? This can then have the 2-3 methods necessary (calculate write-load, get write-load).

henningandersen

Thanks, left a few early comments on this.

henningandersen · 2022-11-08T18:12:11Z

server/src/main/java/org/elasticsearch/index/shard/IndexWriteLoadForecast.java

+
+            final IndexWriteLoad writeLoad = indexMetadata.getWriteLoad();
+            for (int shardId = 0; shardId < numberOfShards; shardId++) {
+                OptionalDouble writeLoadForShard = writeLoad.getWriteLoadForShard(shardId);


This can fail if the number of shards differ. Also, we may not consider all shards if the number of shards is now less than before.

henningandersen · 2022-11-08T18:19:11Z

...r/src/main/java/org/elasticsearch/action/admin/indices/rollover/MetadataRolloverService.java

+            dataStream,
+            newState,
+            TimeValue.timeValueDays(7),
+            TimeValue.timeValueHours(8)


This seems high and I think it excludes all indices that rollover more frequently than every 8 hours from ever having a write load forecast?

I think we can lower it to say 1 hour. But I also wonder if we should also use the avg up time of all candidate shards. Such that we would use min(avgUpTime, fixedMinimumUpTime) as the actual minimum uptime.

henningandersen · 2022-11-08T18:33:36Z

...n/src/main/java/org/elasticsearch/xpack/writeload/WriteLoadForecastIndexSettingProvider.java

+            if (allSettings.hasValue(IndexSettings.DEFAULT_WRITE_LOAD_SETTING.getKey())) {
+                // TODO: warn when the setting exists and the license is invalid?
+                settingsBuilder.put(
+                    IndexSettings.DEFAULT_INTERNAL_WRITE_LOAD_SETTING.getKey(),


This looks unnecessary? I think we can instead guard the read of the write load setting by the FORECAST_WRITE_LOAD_SETTING?

henningandersen · 2022-11-08T18:35:29Z

server/src/main/java/org/elasticsearch/index/shard/IndexWriteLoadForecast.java

+            if (shardIndicesTookIntoAccount > 0) {
+                modified = true;
+                double normalizedShardLoad = (totalWriteLoad[shardId] / shardIndicesTookIntoAccount) / maxWriteLoad[shardId];
+                projectedWriteLoad.withForecastedWriteLoad(shardId, normalizedShardLoad);


I think we should instead just end up with one load number on the index. This can then be evenly spread across the shards. Trying to preserve any shard stickyness (i.e., routing use) seems heroic and makes it more difficult to disregard shards with short uptimes.

henningandersen · 2022-11-08T18:44:58Z

...n/src/main/java/org/elasticsearch/xpack/writeload/WriteLoadForecastIndexSettingProvider.java

+        Settings allSettings,
+        List<CompressedXContent> combinedTemplateMappings
+    ) {
+        if (dataStreamName != null && metadata.dataStreams().get(dataStreamName) != null && hasValidLicense.getAsBoolean()) {


One option is to add a new method to ClusterPlugin that returns a WriteLoadHandler, which can then be implemented in the plugin? This can then have the 2-3 methods necessary (calculate write-load, get write-load).

…h into forecast-write-load

fcofdez · 2022-11-09T11:27:29Z

@henningandersen thanks for the early feedback, I iterated this based on your feedback, now all the forecast logic lives in the plugin and we compute a single value based on all the past indices write load average. Let me know what you think.

henningandersen

This direction looks good. I left a few detail comments. Did not dive into tests yet, since this is draft, happy to do so if tests are ready for review?

henningandersen · 2022-11-09T12:18:13Z

...r/src/main/java/org/elasticsearch/xpack/writeloadforecaster/LicensedWriteLoadForecaster.java

+            final long indexAge = System.currentTimeMillis() - indexMetadata.getCreationDate();
+            final IndexWriteLoad writeLoad = indexMetadata.getWriteLoad();
+
+            if (index.equals(dataStream.getWriteIndex()) || indexAge > maxIndexAge.millis() || writeLoad == null) {


For rollovers that happen less frequently than maxIndexAge, I think we'd want to include the index. I think we also want to include the first index in creation order that has creation-date before the max-age (though that may be unnecessary if we just ensure to include minimum 1). The reason being that in order to look back max-age time, you need to include the last index that was created before max-age since the life time of that overlaps the lookback period.

henningandersen · 2022-11-09T12:27:21Z

...r/src/main/java/org/elasticsearch/xpack/writeloadforecaster/LicensedWriteLoadForecaster.java

+                    assert uptimeInMillisForShard.isPresent();
+                    double shardWriteLoad = writeLoadForShard.getAsDouble();
+                    long shardUptimeInMillis = uptimeInMillisForShard.getAsLong();
+                    if (shardUptimeInMillis > minShardUptime.millis()) {


I wonder if we can improve this, avoiding the min shard up time setting. My idea is to use the up-time to weigh the avg. i.e., we compute sum(shardWriteLoad*shardUptimeInMillis) and sum(shardUpTimeInMillis) (summing over all shards of all indices) and then calculate the final indexWriteLoadAvg (a few lines below) as

sum(shardWriteLoad*shardUptimeInMillis) / sum(shardUpTimeInMillis)

That way, if a shard had very short uptime, it's impact on the indexWriteLoadAvg will also be very small. And we avoid the minimum shard uptime setting, since the inaccuracy of short uptimes will not be important, but we'll still use whatever we have as the basis.

Otherwise I think we still risk not having any data if we rollover for instance every 30 mins on an index (unless if I missed where that is addressed).

henningandersen · 2022-11-09T12:28:55Z

...ter/src/main/java/org/elasticsearch/xpack/writeloadforecaster/WriteLoadForecasterPlugin.java

+    );
+
+    public static final Setting<Double> DEFAULT_WRITE_LOAD_FORECAST_SETTING = Setting.doubleSetting(
+        "index.default_write_load_forecast",


Is this not the override more than the default write load? It seems to override our automated value.

DaveCTurner

Just a few thoughts on the overall structure, I'm taking a deeper look at the details of the computation now.

DaveCTurner · 2022-11-10T12:18:21Z

...r/src/main/java/org/elasticsearch/action/admin/indices/rollover/MetadataRolloverService.java

@@ -307,6 +311,8 @@ private RolloverResult rolloverDataStream(
            )
            .build();

+        newState = writeLoadForecaster.withWriteLoadForecastForWriteIndex(dataStreamName, newState);


Could we make this an operator IndexMetadata.Builder -> IndexMetadata.Builder instead of ClusterState -> ClusterState to avoid having to build a new Metadata twice?

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

DaveCTurner · 2022-11-10T12:27:55Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java

@@ -1162,6 +1174,10 @@ public IndexWriteLoad getWriteLoad() {
        return writeLoad;
    }

+    public OptionalDouble getForecastedWriteLoad() {


I think it would be good to mark this as a forbidden API so we don't inadvertently call it directly, which would bypass the license check built into WriteLoadForecaster#getForecastedWriteLoad.

elasticsearchmachine · 2022-11-10T14:30:58Z

Pinging @elastic/es-distributed (Team:Distributed)

fcofdez · 2022-11-10T14:31:28Z

I think this is ready for review, there are a few doc tests failures that I'm looking into, but I think that shouldn't block the review.

fcofdez · 2022-11-10T15:31:36Z

@elasticmachine run elasticsearch-ci/part-1

henningandersen

LGTM.

Left a few comments, but unless radical changes come up, no need for another review round.

henningandersen · 2022-11-10T20:55:45Z

...r/src/main/java/org/elasticsearch/xpack/writeloadforecaster/LicensedWriteLoadForecaster.java

+        for (int i = 0; i < dataStreamIndices.size(); i++) {
+            Index index = dataStreamIndices.get(i);
+            final IndexMetadata indexMetadata = metadata.getSafe(index);
+            final long indexAge = System.currentTimeMillis() - indexMetadata.getCreationDate();


Can we use ThreadPool.absoluteTimeInMillis instead - and only read it once outside the loop?

henningandersen · 2022-11-10T20:57:15Z

...r/src/main/java/org/elasticsearch/xpack/writeloadforecaster/LicensedWriteLoadForecaster.java

+        return firstIndexWithinAgeRange == 0
+            ? dataStreamIndices
+            : dataStreamIndices.subList(firstIndexWithinAgeRange, dataStreamIndices.size());


The full list optimization seems unimportant, can we not just do:

Suggested change

return firstIndexWithinAgeRange == 0

? dataStreamIndices

: dataStreamIndices.subList(firstIndexWithinAgeRange, dataStreamIndices.size());

return dataStreamIndices.subList(firstIndexWithinAgeRange, dataStreamIndices.size());

henningandersen · 2022-11-11T11:38:51Z

...ernalClusterTest/java/org/elasticsearch/xpack/writeloadforecaster/WriteLoadForecasterIT.java

+
+        final OptionalDouble indexMetadataForecastedWriteLoad = writeIndexMetadata.getForecastedWriteLoad();
+        assertThat(indexMetadataForecastedWriteLoad.isPresent(), is(equalTo(true)));
+        assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThanOrEqualTo(0.0)));


Can we change this to:

Suggested change

assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThanOrEqualTo(0.0)));

assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThan(0.0)));

since we assert busy in the setup, waiting for a write load to be registered?

henningandersen · 2022-11-11T11:44:35Z

...ernalClusterTest/java/org/elasticsearch/xpack/writeloadforecaster/WriteLoadForecasterIT.java

+    }
+
+    public static class FakeLicenseWriteLoadForecasterPlugin extends WriteLoadForecasterPlugin {
+        private static final AtomicBoolean hasValidLicense = new AtomicBoolean(true);


I'd prefer to change to a non-static field here. I think you can find the plugins using

internalCluster().getInstances(PluginsService.class).filterPlugins(FakeLicenseWriteLoadForecasterPlugin.class))

henningandersen · 2022-11-11T11:48:56Z

...ernalClusterTest/java/org/elasticsearch/xpack/writeloadforecaster/WriteLoadForecasterIT.java

+
+        final OptionalDouble indexMetadataForecastedWriteLoad = writeIndexMetadata.getForecastedWriteLoad();
+        assertThat(indexMetadataForecastedWriteLoad.isPresent(), is(equalTo(true)));
+        assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThanOrEqualTo(0.0)));


Can we change this to:

Suggested change

assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThanOrEqualTo(0.0)));

assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThan(0.0)));

since we assert busy in the setup, waiting for a write load to be registered?

henningandersen · 2022-11-11T11:53:22Z

...r/src/main/java/org/elasticsearch/xpack/writeloadforecaster/LicensedWriteLoadForecaster.java

+        clusterSettings.addSettingsUpdateConsumer(MAX_INDEX_AGE_SETTING, this::setMaxIndexAgeSetting);
+    }
+
+    LicensedWriteLoadForecaster(BooleanSupplier hasValidLicense, TimeValue maxIndexAge) {


Can we add

// exposed for tests only

?

henningandersen · 2022-11-11T12:03:58Z

.../test/java/org/elasticsearch/xpack/writeloadforecaster/LicensedWriteLoadForecasterTests.java

+
+        final IndexMetadata writeIndex = updatedMetadataBuilder.getSafe(dataStream.getWriteIndex());
+
+        final OptionalDouble forecastedWriteLoadForShard = writeLoadForecaster.getForecastedWriteLoad(writeIndex);


I am not sure I understand why the variable is named "ForShard"? I think the write load we obtain here is for the index in total and need to be divided by number of shards? Perhaps we need to do that inside getForecastedWriteLoad? A question for the integration into the actual balancer I suppose.

This is a leftover from the previous approach 🤦.

fcofdez · 2022-11-14T12:21:18Z

@elasticmachine update branch

fcofdez · 2022-11-14T13:02:58Z

@elasticmachine generate changelog

elasticsearchmachine · 2022-11-14T13:03:00Z

Hi @fcofdez, I've created a changelog YAML for you.

fcofdez · 2022-11-14T13:46:42Z

Thanks for the reviews!

* main: (163 commits) [DOCS] Edits frequent items aggregation (elastic#91564) Handle providers of optional services in ubermodule classloader (elastic#91217) Add `exportDockerImages` lifecycle task for exporting docker tarballs (elastic#91571) Fix CSV dependency report output file location in DRA CI job Fix variable placeholder for Strings.format calls (elastic#91531) Fix output dir creation in ConcatFileTask (elastic#91568) Fix declaration of dependencies in DRA snapshots CI job (elastic#91569) Upgrade Gradle Enterprise plugin to 3.11.4 (elastic#91435) Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern (elastic#91521) Fix inter project handling of generateDependenciesReport (elastic#91555) [Synthetics] Add synthetics-* read to fleet-server (elastic#91391) [ML] Copy more settings when creating DF analytics destination index (elastic#91546) Reduce CartesianCentroidIT flakiness (elastic#91553) Propagate last node to reinitialized routing tables (elastic#91549) Forecast write load during rollovers (elastic#91425) [DOCS] Warn about potential overhead of named queries (elastic#91512) Datastream unavailable exception metadata (elastic#91461) Generate docker images and dependency report in DRA ci job (elastic#91545) Support cartesian_bounds aggregation on point and shape (elastic#91298) Add support for EQL samples queries (elastic#91312) ... # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java

Forecast normalized write load during rollovers

5c30d12

fcofdez requested review from henningandersen, DaveCTurner and idegtiarenko November 8, 2022 17:51

Update docs/changelog/91425.yaml

52899e0

fcofdez commented Nov 8, 2022

View reviewed changes

henningandersen reviewed Nov 8, 2022

View reviewed changes

fcofdez added 3 commits November 9, 2022 12:06

Move logic to plugin

2bf6e24

Merge remote-tracking branch 'origin/main' into forecast-write-load

e4e8151

Merge branch 'forecast-write-load' of github.com:fcofdez/elasticsearc…

45d1d0f

…h into forecast-write-load

fcofdez requested a review from henningandersen November 9, 2022 11:15

Add dynamic setting flag

081b7ac

henningandersen reviewed Nov 9, 2022

View reviewed changes

fcofdez added 2 commits November 10, 2022 12:55

Compute weighted avg based on uptime

48e43d2

Merge remote-tracking branch 'origin/main' into forecast-write-load

e450d2c

DaveCTurner reviewed Nov 10, 2022

View reviewed changes

fcofdez added 2 commits November 10, 2022 14:48

Review comments

6294381

Merge remote-tracking branch 'origin/main' into forecast-write-load

14b5d18

fcofdez changed the title ~~[WIP] Forecast normalized write load during rollovers~~ Forecast write load during rollovers Nov 10, 2022

fcofdez marked this pull request as ready for review November 10, 2022 14:30

fcofdez requested review from henningandersen and DaveCTurner November 10, 2022 14:30

henningandersen approved these changes Nov 11, 2022

View reviewed changes

fcofdez added 4 commits November 14, 2022 11:46

Review comments

0655ab4

Merge remote-tracking branch 'origin/main' into forecast-write-load

a871fe4

Delete changelog

7d1acfa

Renaming

e244c6d

idegtiarenko approved these changes Nov 14, 2022

View reviewed changes

Merge branch 'main' into forecast-write-load

98b268c

Update docs/changelog/91425.yaml

e01543b

fcofdez merged commit 089ee1d into elastic:main Nov 14, 2022

	assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThanOrEqualTo(0.0)));
	assertThat(indexMetadataForecastedWriteLoad.getAsDouble(), is(greaterThan(0.0)));


		final IndexMetadata writeIndex = updatedMetadataBuilder.getSafe(dataStream.getWriteIndex());

		final OptionalDouble forecastedWriteLoadForShard = writeLoadForecaster.getForecastedWriteLoad(writeIndex);

Forecast write load during rollovers #91425

Forecast write load during rollovers #91425

Conversation

fcofdez commented Nov 8, 2022 • edited Loading

elasticsearchmachine commented Nov 8, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcofdez commented Nov 9, 2022

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 10, 2022

fcofdez commented Nov 10, 2022

fcofdez commented Nov 10, 2022

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcofdez commented Nov 14, 2022

fcofdez commented Nov 14, 2022

elasticsearchmachine commented Nov 14, 2022

fcofdez commented Nov 14, 2022

fcofdez commented Nov 8, 2022 •

edited

Loading