-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make enrich cache based on memory usage #111412
Conversation
Documentation preview: |
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @nielsbauman, I've created a changelog YAML for you. |
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java
Outdated
Show resolved
Hide resolved
EnrichCache(ByteSizeValue maxByteSize, LongSupplier relativeNanoTimeProvider) { | ||
this.relativeNanoTimeProvider = relativeNanoTimeProvider; | ||
this.cache = createCache(maxByteSize.getBytes(), (key, value) -> value.sizeInBytes); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add any explicit tests for the default case (i.e. where we configure a number of bytes instead of a flat document count) because I think that's pretty much covered by all test clusters that don't explicitly configure an enrich cache size. Let me know if people have any other thoughts.
server/src/main/java/org/elasticsearch/common/settings/Setting.java
Outdated
Show resolved
Hide resolved
I'm not sure 1% of heap is a reasonable default -- but I'm not sure it's not a reasonable default, either. The default from before this PR is 1000 items. The default size of a data node on https://cloud.elastic.co/ for a new cluster is 8gb (so that's ~4gb of JVM heap). 1% of that is 40mb. I have in my notes that a fair estimate of the size of a cache entry is 2kb. So my read of things is that we're more or less changing the default size from 1000 to 20,000 in the case of 8gb node. That seems like a pretty big jump! |
Besides the question of what the new default value should be (as a percentage of the JVM heap of the node), the biggest open question I have on this PR is whether we're okay with having the valid user-settable values for this setting to be in terms of the count of the items in the cache, but the default value of the size of the being something different than that. In the version of this where we expose maximum configurability, I could see us providing three ways of setting the cache size:
The current behavior before this PR only supports 1. The new behavior on this PR is that the user configurable behavior remains only just 1, but that we have a default in terms of 3. On the one hand, I hesitate for us to implement umpteen options that are theoretically interesting but that aren't actually going to be used in practice. On the other hand, I could see somebody hitting the 1% limit and thinking "oh, I'll just double it to 2% then" and being a little annoyed that we don't provide the ability to do that. @dakrone what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments, I think we should make sure we allow specifying it in the format we intend to make the default (percentage) and probably also in absolute size as well.
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java
Outdated
Show resolved
Hide resolved
/** | ||
* A class that specifies either a flat (unit-less) number or a byte size value. | ||
*/ | ||
public static class FlatNumberOrByteSizeValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions for this name are welcome 😅 (also it's location if people think it should be a separate class and/or in a different package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay here. I can't think of a better name either
determines the size of that cache. | ||
Maximum size of the cache that caches searches for enriching documents. | ||
The size can be specified in three units: the raw number of | ||
cached searches (e.g. `1000`), an absolute size in bytes (e.g. `100Mb`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly state here that you need to in include a b
(upper or lowercase) for the second unit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow all the same things that ByteSizeValue
supports, which would be all the endings in
elasticsearch/server/src/main/java/org/elasticsearch/common/unit/ByteSizeValue.java
Lines 232 to 252 in ffc22b2
if (lowerSValue.endsWith("k")) { | |
return parse(sValue, lowerSValue, "k", ByteSizeUnit.KB, settingName); | |
} else if (lowerSValue.endsWith("kb")) { | |
return parse(sValue, lowerSValue, "kb", ByteSizeUnit.KB, settingName); | |
} else if (lowerSValue.endsWith("m")) { | |
return parse(sValue, lowerSValue, "m", ByteSizeUnit.MB, settingName); | |
} else if (lowerSValue.endsWith("mb")) { | |
return parse(sValue, lowerSValue, "mb", ByteSizeUnit.MB, settingName); | |
} else if (lowerSValue.endsWith("g")) { | |
return parse(sValue, lowerSValue, "g", ByteSizeUnit.GB, settingName); | |
} else if (lowerSValue.endsWith("gb")) { | |
return parse(sValue, lowerSValue, "gb", ByteSizeUnit.GB, settingName); | |
} else if (lowerSValue.endsWith("t")) { | |
return parse(sValue, lowerSValue, "t", ByteSizeUnit.TB, settingName); | |
} else if (lowerSValue.endsWith("tb")) { | |
return parse(sValue, lowerSValue, "tb", ByteSizeUnit.TB, settingName); | |
} else if (lowerSValue.endsWith("p")) { | |
return parse(sValue, lowerSValue, "p", ByteSizeUnit.PB, settingName); | |
} else if (lowerSValue.endsWith("pb")) { | |
return parse(sValue, lowerSValue, "pb", ByteSizeUnit.PB, settingName); | |
} else if (lowerSValue.endsWith("b")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the logic to handle all of ByteSizeValue
's cases. But I was more referring to that if you want to specify bytes, you'll have to add one of those letters -- since specifying a raw number of bytes would be parsed as a flat document count. But that disclaimer might be redundant/overkill.
|
||
private static final String SETTING_NAME = "test.setting"; | ||
|
||
public void testParse() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently only have tests for parsing. Should we also have tests that actually try to verify the behavior of using a byte size value in the cache instead of a flat document count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to add those, it's fine to just expose that in EnrichCache
with a getMaxSize()
method, and test that configuring it to an absolute byte size value sets it appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments, I think it LGTM in general though!
determines the size of that cache. | ||
Maximum size of the cache that caches searches for enriching documents. | ||
The size can be specified in three units: the raw number of | ||
cached searches (e.g. `1000`), an absolute size in bytes (e.g. `100Mb`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow all the same things that ByteSizeValue
supports, which would be all the endings in
elasticsearch/server/src/main/java/org/elasticsearch/common/unit/ByteSizeValue.java
Lines 232 to 252 in ffc22b2
if (lowerSValue.endsWith("k")) { | |
return parse(sValue, lowerSValue, "k", ByteSizeUnit.KB, settingName); | |
} else if (lowerSValue.endsWith("kb")) { | |
return parse(sValue, lowerSValue, "kb", ByteSizeUnit.KB, settingName); | |
} else if (lowerSValue.endsWith("m")) { | |
return parse(sValue, lowerSValue, "m", ByteSizeUnit.MB, settingName); | |
} else if (lowerSValue.endsWith("mb")) { | |
return parse(sValue, lowerSValue, "mb", ByteSizeUnit.MB, settingName); | |
} else if (lowerSValue.endsWith("g")) { | |
return parse(sValue, lowerSValue, "g", ByteSizeUnit.GB, settingName); | |
} else if (lowerSValue.endsWith("gb")) { | |
return parse(sValue, lowerSValue, "gb", ByteSizeUnit.GB, settingName); | |
} else if (lowerSValue.endsWith("t")) { | |
return parse(sValue, lowerSValue, "t", ByteSizeUnit.TB, settingName); | |
} else if (lowerSValue.endsWith("tb")) { | |
return parse(sValue, lowerSValue, "tb", ByteSizeUnit.TB, settingName); | |
} else if (lowerSValue.endsWith("p")) { | |
return parse(sValue, lowerSValue, "p", ByteSizeUnit.PB, settingName); | |
} else if (lowerSValue.endsWith("pb")) { | |
return parse(sValue, lowerSValue, "pb", ByteSizeUnit.PB, settingName); | |
} else if (lowerSValue.endsWith("b")) { |
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java
Outdated
Show resolved
Hide resolved
/** | ||
* A class that specifies either a flat (unit-less) number or a byte size value. | ||
*/ | ||
public static class FlatNumberOrByteSizeValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay here. I can't think of a better name either
|
||
private static final String SETTING_NAME = "test.setting"; | ||
|
||
public void testParse() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to add those, it's fine to just expose that in EnrichCache
with a getMaxSize()
method, and test that configuring it to an absolute byte size value sets it appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left a couple of really minor comments
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPlugin.java
Outdated
Show resolved
Hide resolved
...ugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/FlatNumberOrByteSizeValueTests.java
Outdated
Show resolved
Hide resolved
The max enrich cache size setting now also supports an absolute max size in bytes (of used heap space) and a percentage of the max heap space, next to the existing flat document count. The default is 1% of the max heap space. This should prevent issues where the enrich cache takes up a lot of memory when there are large documents in the cache.
The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit reverts that rename. The fix that gets backported to 8.16, 8.17 and 8.x will allow both versions, to avoid breaking BWC twice.
The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit ensures we accept both versions, to avoid breaking BWC twice.
The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in #111412. This commit updates the enrich plugin to accept both names and deprecates the wrong name.
The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit updates the enrich plugin to accept both names and deprecates the wrong name.
The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit updates the enrich plugin to accept both names and deprecates the wrong name.
The enrich cache size setting accidentally got renamed from `enrich.cache_size` to `enrich.cache.size` in elastic#111412. This commit updates the enrich plugin to accept both names and deprecates the wrong name.
Instead of having the enrich cache rely simply on a flat document count, the cache now, by default, looks at the size in bytes of the search results and aims to avoid using more than 1% of the node's max heap space. This size in bytes of the search results is an approximation - meaning we can't guarantee it won't exceed the 1% threshold - but it shouldn't be far off.
Closes #106081