Make enrich cache based on memory usage #111412

nielsbauman · 2024-07-29T14:17:47Z

Instead of having the enrich cache rely simply on a flat document count, the cache now, by default, looks at the size in bytes of the search results and aims to avoid using more than 1% of the node's max heap space. This size in bytes of the search results is an approximation - meaning we can't guarantee it won't exceed the 1% threshold - but it shouldn't be far off.

Closes #106081

github-actions · 2024-07-29T14:18:01Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-07-29T14:18:11Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2024-07-29T14:18:13Z

Hi @nielsbauman, I've created a changelog YAML for you.

docs/reference/ingest/enrich.asciidoc

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java

nielsbauman · 2024-07-29T14:22:33Z

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java

+    EnrichCache(ByteSizeValue maxByteSize, LongSupplier relativeNanoTimeProvider) {
+        this.relativeNanoTimeProvider = relativeNanoTimeProvider;
+        this.cache = createCache(maxByteSize.getBytes(), (key, value) -> value.sizeInBytes);
+    }


I didn't add any explicit tests for the default case (i.e. where we configure a number of bytes instead of a flat document count) because I think that's pretty much covered by all test clusters that don't explicitly configure an enrich cache size. Let me know if people have any other thoughts.

server/src/main/java/org/elasticsearch/common/settings/Setting.java

joegallo · 2024-08-02T16:13:40Z

I'm not sure 1% of heap is a reasonable default -- but I'm not sure it's not a reasonable default, either.

The default from before this PR is 1000 items. The default size of a data node on https://cloud.elastic.co/ for a new cluster is 8gb (so that's ~4gb of JVM heap). 1% of that is 40mb. I have in my notes that a fair estimate of the size of a cache entry is 2kb. So my read of things is that we're more or less changing the default size from 1000 to 20,000 in the case of 8gb node. That seems like a pretty big jump!

joegallo · 2024-08-02T16:32:36Z

Besides the question of what the new default value should be (as a percentage of the JVM heap of the node), the biggest open question I have on this PR is whether we're okay with having the valid user-settable values for this setting to be in terms of the count of the items in the cache, but the default value of the size of the being something different than that.

In the version of this where we expose maximum configurability, I could see us providing three ways of setting the cache size:

A maximum size of the cache in terms of a count of cache entries
A maximum size of the cache in terms of the estimated memory consumed by the cache entries (expressed in say, mb or whatever)
A maximum size of the cache in terms of the estimated memory consumed by the cache entries (expressed as percent of the JVM heap of the node)

The current behavior before this PR only supports 1. The new behavior on this PR is that the user configurable behavior remains only just 1, but that we have a default in terms of 3.

On the one hand, I hesitate for us to implement umpteen options that are theoretically interesting but that aren't actually going to be used in practice. On the other hand, I could see somebody hitting the 1% limit and thinking "oh, I'll just double it to 2% then" and being a little annoyed that we don't provide the ability to do that.

@dakrone what do you think?

dakrone

I left a couple of comments, I think we should make sure we allow specifying it in the format we intend to make the default (percentage) and probably also in absolute size as well.

docs/reference/ingest/enrich.asciidoc

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java

nielsbauman · 2024-08-14T12:21:28Z

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java

+    /**
+     * A class that specifies either a flat (unit-less) number or a byte size value.
+     */
+    public static class FlatNumberOrByteSizeValue {


Any suggestions for this name are welcome 😅 (also it's location if people think it should be a separate class and/or in a different package).

I think it's okay here. I can't think of a better name either

nielsbauman · 2024-08-14T12:22:26Z

docs/reference/ingest/enrich.asciidoc

-determines the size of that cache.
+Maximum size of the cache that caches searches for enriching documents.
+The size can be specified in three units: the raw number of
+cached searches (e.g. `1000`), an absolute size in bytes (e.g. `100Mb`),


Should we explicitly state here that you need to in include a b (upper or lowercase) for the second unit?

I think we should allow all the same things that ByteSizeValue supports, which would be all the endings in

elasticsearch/server/src/main/java/org/elasticsearch/common/unit/ByteSizeValue.java

Lines 232 to 252 in ffc22b2

if (lowerSValue.endsWith("k")) {

return parse(sValue, lowerSValue, "k", ByteSizeUnit.KB, settingName);

} else if (lowerSValue.endsWith("kb")) {

return parse(sValue, lowerSValue, "kb", ByteSizeUnit.KB, settingName);

} else if (lowerSValue.endsWith("m")) {

return parse(sValue, lowerSValue, "m", ByteSizeUnit.MB, settingName);

} else if (lowerSValue.endsWith("mb")) {

return parse(sValue, lowerSValue, "mb", ByteSizeUnit.MB, settingName);

} else if (lowerSValue.endsWith("g")) {

return parse(sValue, lowerSValue, "g", ByteSizeUnit.GB, settingName);

} else if (lowerSValue.endsWith("gb")) {

return parse(sValue, lowerSValue, "gb", ByteSizeUnit.GB, settingName);

} else if (lowerSValue.endsWith("t")) {

return parse(sValue, lowerSValue, "t", ByteSizeUnit.TB, settingName);

} else if (lowerSValue.endsWith("tb")) {

return parse(sValue, lowerSValue, "tb", ByteSizeUnit.TB, settingName);

} else if (lowerSValue.endsWith("p")) {

return parse(sValue, lowerSValue, "p", ByteSizeUnit.PB, settingName);

} else if (lowerSValue.endsWith("pb")) {

return parse(sValue, lowerSValue, "pb", ByteSizeUnit.PB, settingName);

} else if (lowerSValue.endsWith("b")) {

I've updated the logic to handle all of ByteSizeValue's cases. But I was more referring to that if you want to specify bytes, you'll have to add one of those letters -- since specifying a raw number of bytes would be parsed as a flat document count. But that disclaimer might be redundant/overkill.

nielsbauman · 2024-08-14T12:24:20Z

...ugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/FlatNumberOrByteSizeValueTests.java

+
+    private static final String SETTING_NAME = "test.setting";
+
+    public void testParse() {


I currently only have tests for parsing. Should we also have tests that actually try to verify the behavior of using a byte size value in the cache instead of a flat document count?

I think it'd be good to add those, it's fine to just expose that in EnrichCache with a getMaxSize() method, and test that configuring it to an absolute byte size value sets it appropriately

dakrone

I left a few more comments, I think it LGTM in general though!

docs/reference/ingest/enrich.asciidoc

dakrone · 2024-08-15T20:00:07Z

docs/reference/ingest/enrich.asciidoc

-determines the size of that cache.
+Maximum size of the cache that caches searches for enriching documents.
+The size can be specified in three units: the raw number of
+cached searches (e.g. `1000`), an absolute size in bytes (e.g. `100Mb`),


I think we should allow all the same things that ByteSizeValue supports, which would be all the endings in

elasticsearch/server/src/main/java/org/elasticsearch/common/unit/ByteSizeValue.java

Lines 232 to 252 in ffc22b2

if (lowerSValue.endsWith("k")) {

return parse(sValue, lowerSValue, "k", ByteSizeUnit.KB, settingName);

} else if (lowerSValue.endsWith("kb")) {

return parse(sValue, lowerSValue, "kb", ByteSizeUnit.KB, settingName);

} else if (lowerSValue.endsWith("m")) {

return parse(sValue, lowerSValue, "m", ByteSizeUnit.MB, settingName);

} else if (lowerSValue.endsWith("mb")) {

return parse(sValue, lowerSValue, "mb", ByteSizeUnit.MB, settingName);

} else if (lowerSValue.endsWith("g")) {

return parse(sValue, lowerSValue, "g", ByteSizeUnit.GB, settingName);

} else if (lowerSValue.endsWith("gb")) {

return parse(sValue, lowerSValue, "gb", ByteSizeUnit.GB, settingName);

} else if (lowerSValue.endsWith("t")) {

return parse(sValue, lowerSValue, "t", ByteSizeUnit.TB, settingName);

} else if (lowerSValue.endsWith("tb")) {

return parse(sValue, lowerSValue, "tb", ByteSizeUnit.TB, settingName);

} else if (lowerSValue.endsWith("p")) {

return parse(sValue, lowerSValue, "p", ByteSizeUnit.PB, settingName);

} else if (lowerSValue.endsWith("pb")) {

return parse(sValue, lowerSValue, "pb", ByteSizeUnit.PB, settingName);

} else if (lowerSValue.endsWith("b")) {

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java

dakrone · 2024-08-15T20:02:21Z

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java

+    /**
+     * A class that specifies either a flat (unit-less) number or a byte size value.
+     */
+    public static class FlatNumberOrByteSizeValue {


I think it's okay here. I can't think of a better name either

dakrone · 2024-08-15T20:04:27Z

...ugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/FlatNumberOrByteSizeValueTests.java

+
+    private static final String SETTING_NAME = "test.setting";
+
+    public void testParse() {


I think it'd be good to add those, it's fine to just expose that in EnrichCache with a getMaxSize() method, and test that configuring it to an absolute byte size value sets it appropriately

dakrone

LGTM, I left a couple of really minor comments

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java

...ugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/FlatNumberOrByteSizeValueTests.java

The max enrich cache size setting now also supports an absolute max size in bytes (of used heap space) and a percentage of the max heap space, next to the existing flat document count. The default is 1% of the max heap space. This should prevent issues where the enrich cache takes up a lot of memory when there are large documents in the cache.

The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit reverts that rename. The fix that gets backported to 8.16, 8.17 and 8.x will allow both versions, to avoid breaking BWC twice.

The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit ensures we accept both versions, to avoid breaking BWC twice.

The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in #111412. This commit updates the enrich plugin to accept both names and deprecates the wrong name.

The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit updates the enrich plugin to accept both names and deprecates the wrong name.

* Fix enrich cache size setting name (#117575) The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in #111412. This commit updates the enrich plugin to accept both names and deprecates the wrong name. * Remove `UpdateForV10` annotation

Make enrich cache based on memory usage

391ff53

nielsbauman added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.16.0 labels Jul 29, 2024

nielsbauman requested a review from joegallo July 29, 2024 14:17

Update docs/changelog/111412.yaml

6a26bf7

nielsbauman commented Jul 29, 2024

View reviewed changes

parkertimmins reviewed Jul 29, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/common/settings/Setting.java Outdated Show resolved Hide resolved

joegallo requested review from dakrone and removed request for joegallo August 2, 2024 16:32

dakrone requested changes Aug 9, 2024

View reviewed changes

docs/reference/ingest/enrich.asciidoc Outdated Show resolved Hide resolved

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java Outdated Show resolved Hide resolved

nielsbauman added 3 commits August 12, 2024 15:10

Merge branch 'main' into enrich-cache

6e681bc

Allow setting cache size in different units

d6963d3

Revert method visibility change

c74d92e

nielsbauman commented Aug 14, 2024

View reviewed changes

nielsbauman requested review from dakrone and joegallo August 14, 2024 12:24

dakrone reviewed Aug 15, 2024

View reviewed changes

nielsbauman added 3 commits August 20, 2024 21:05

Merge branch 'main' into enrich-cache

0eb7b0d

Remove redundant comment line

56ee691

Refactor parsing condition

45797aa

nielsbauman requested a review from dakrone August 21, 2024 08:02

dakrone approved these changes Aug 21, 2024

View reviewed changes

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java Outdated Show resolved Hide resolved

...ugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/FlatNumberOrByteSizeValueTests.java Outdated Show resolved Hide resolved

Use Strings.hasText

538ac28

Add tests

b95a1fc

nielsbauman requested a review from dakrone August 22, 2024 11:20

nielsbauman merged commit e0c1ccb into elastic:main Aug 23, 2024
15 checks passed

nielsbauman deleted the enrich-cache branch August 23, 2024 07:26

nielsbauman mentioned this pull request Nov 26, 2024

Fix enrich cache size setting name #117575

Merged

nielsbauman mentioned this pull request Nov 26, 2024

Fix enrich cache size setting name #117576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make enrich cache based on memory usage #111412

Make enrich cache based on memory usage #111412

nielsbauman commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

elasticsearchmachine commented Jul 29, 2024

elasticsearchmachine commented Jul 29, 2024

nielsbauman Jul 29, 2024

joegallo commented Aug 2, 2024 •

edited

Loading

joegallo commented Aug 2, 2024

dakrone left a comment

nielsbauman Aug 14, 2024

dakrone Aug 15, 2024

nielsbauman Aug 14, 2024

dakrone Aug 15, 2024

nielsbauman Aug 20, 2024

nielsbauman Aug 14, 2024

dakrone Aug 15, 2024

dakrone left a comment

dakrone Aug 15, 2024

dakrone Aug 15, 2024

dakrone Aug 15, 2024

dakrone left a comment

	if (lowerSValue.endsWith("k")) {
	return parse(sValue, lowerSValue, "k", ByteSizeUnit.KB, settingName);
	} else if (lowerSValue.endsWith("kb")) {
	return parse(sValue, lowerSValue, "kb", ByteSizeUnit.KB, settingName);
	} else if (lowerSValue.endsWith("m")) {
	return parse(sValue, lowerSValue, "m", ByteSizeUnit.MB, settingName);
	} else if (lowerSValue.endsWith("mb")) {
	return parse(sValue, lowerSValue, "mb", ByteSizeUnit.MB, settingName);
	} else if (lowerSValue.endsWith("g")) {
	return parse(sValue, lowerSValue, "g", ByteSizeUnit.GB, settingName);
	} else if (lowerSValue.endsWith("gb")) {
	return parse(sValue, lowerSValue, "gb", ByteSizeUnit.GB, settingName);
	} else if (lowerSValue.endsWith("t")) {
	return parse(sValue, lowerSValue, "t", ByteSizeUnit.TB, settingName);
	} else if (lowerSValue.endsWith("tb")) {
	return parse(sValue, lowerSValue, "tb", ByteSizeUnit.TB, settingName);
	} else if (lowerSValue.endsWith("p")) {
	return parse(sValue, lowerSValue, "p", ByteSizeUnit.PB, settingName);
	} else if (lowerSValue.endsWith("pb")) {
	return parse(sValue, lowerSValue, "pb", ByteSizeUnit.PB, settingName);
	} else if (lowerSValue.endsWith("b")) {


		private static final String SETTING_NAME = "test.setting";

		public void testParse() {

Make enrich cache based on memory usage #111412

Make enrich cache based on memory usage #111412

Conversation

nielsbauman commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

elasticsearchmachine commented Jul 29, 2024

elasticsearchmachine commented Jul 29, 2024

Choose a reason for hiding this comment

joegallo commented Aug 2, 2024 • edited Loading

joegallo commented Aug 2, 2024

dakrone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dakrone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dakrone left a comment

Choose a reason for hiding this comment

joegallo commented Aug 2, 2024 •

edited

Loading