Intern IndexFieldCapabilities Type String on Read #76405

original-brownbear · 2021-08-12T11:42:17Z

Kind of a brute force approach to the problem but this should improve a few reported instances of slow+memory intesive field caps at no real risk IMO:

In case of handling a large number of these messages, i.e. when fetching field caps
for many indices (and/or those indices contain lots of fields) the type string is repeated
many times over. As these strings are already interned because they are constants, taking
the performance hit of interning them on deserialization seems a reasonable trade-off
for the benefit of saving a non-trivial amount of memory for large clusters as well as
speeding up org.elasticsearch.action.fieldcaps.TransportFieldCapabilitiesAction#merge
which uses these strings in map lookup and will run significantly faster with interned strings
instead of fresh strings that do not have their hash values cached yet.

In case of handling a large number of these messages, i.e. when fetching field caps for many indices (and/or those indices contain lots of fields) the type string is repeated many times over. As these strings are already interned because they are constants, taking the performance hit of interning them on deserialization seems a reasonable trade-off for the benefit of saving a non-trivial amount of memory for large clusters as well as speeding up `org.elasticsearch.action.fieldcaps.TransportFieldCapabilitiesAction#merge` which uses these strings in map lookup and will run significantly faster with interned strings instead of fresh strings that do not have their hash values cached yet.

elasticmachine · 2021-08-12T11:42:20Z

Pinging @elastic/es-core-features (Team:Core/Features)

arteam · 2021-08-12T12:45:40Z

server/src/main/java/org/elasticsearch/action/fieldcaps/IndexFieldCapabilities.java

@@ -50,7 +50,7 @@

    IndexFieldCapabilities(StreamInput in) throws IOException {
        this.name = in.readString();
-        this.type = in.readString();
+        this.type = in.readString().intern();


I believe String.intern is usually not recommended because you have no control over the size of the string pool. It used to be pretty bad before Java 8 where the string pool was finally moved to the heap from PermGen. I think a better approach would be to use a ConcurrentMap with Weak/SoftReferences or an actual Cache if it's available.

https://github.com/FasterXML/jackson-databind/blob/2.13/src/main/java/com/fasterxml/jackson/databind/util/LRUMap.java#L24
For example, the approach which is used in Jackson for caching

@arteam True true, in this one instance I figured I could get away with it because we know for a fact that these strings will be in the pool (these are field types and the field mapper objects all have them as concrete strings in the source).
I could certainly see the point of using a static constant CHM and putting the interned strings in therefor performance which will probably be quite a bit faster even in recent JDK. It's just extra code and maybe a bit of a risk in case the type ever stops coming from a constant pool of strings.

@arteam I made this a little more "high-tech" now by putting a CHM in front of intern :) Should be way faster now with the same result as before. Let me know what you think :)

…-caps-type-on-deserialization

server/src/main/java/org/elasticsearch/common/util/StringLiteralDeduplicator.java

…-caps-type-on-deserialization

arteam

LGTM!

elasticmachine · 2021-08-16T19:37:36Z

Pinging @elastic/es-search (Team:Search)

henningandersen

I am largely good with this but have a couple of detailed comments.

henningandersen · 2021-08-17T06:18:02Z

server/src/main/java/org/elasticsearch/common/util/StringLiteralDeduplicator.java

+
+    private static final Logger logger = LogManager.getLogger(StringLiteralDeduplicator.class);
+
+    private static final int MAX_SIZE = 1000;


This will start thrashing once we exceed 1000 strings. I worry about the reuse of this INSTANCE for other purposes. Can we instead make a single instance that is used solely for the field capabilities type field by making a private static instance in the IndexFieldCapabilities class? If other needs arise for this deduplicator, those would thereby be separated completely.

++ made it a private static cache for now just limited to this class

henningandersen · 2021-08-17T06:20:43Z

server/src/main/java/org/elasticsearch/common/util/StringLiteralDeduplicator.java

+        final String interned = string.intern();
+        if (map.size() > MAX_SIZE) {
+            boolean cleared = false;
+            synchronized (this) {


I think exceeding the MAX_SIZE is unexpected. I would advocate not synchronizing in that case, since this runs on the transport thread and simply clear the map in all threads that enter this.

Sure if we only use this in a limited case ++ to that, simplified as requested :)

…-caps-type-on-deserialization

henningandersen

LGTM.

jtibshirani · 2021-08-17T20:38:03Z

Sorry for jumping in late. I was wondering if you had rough numbers showing how much this helps in terms of speed + memory? I am curious because it does add a little complexity, and imagine there are other big contributors to speed/ memory.

original-brownbear · 2021-08-20T11:18:29Z

No worries @jtibshirani .

I was wondering if you had rough numbers showing how much this helps in terms of speed + memory?

Speed I'm having a hard time predicting. The only thing I have to go by here was a user running into trouble in the last step of this transport action where using the type a map key got slow on hashing it (because it's a different string over and over I assume combined with the fact that its internal bytes might not be in the CPU cache after waiting for other responses for a longer period of time) => I could see a bit of a gain here from just keying by the same type string instances every time when merging and in the specific case it would be a measurable speedup I think.

Memory is more straight forward though I think. If you take a field type string like "_routing" that is ~50b for each string instance. That quickly translates into a couple of MB saved and I've seen nodes under heavy load from these requests have hundreds of MB of duplicate type strings on heap in real-world dumps.

original-brownbear · 2021-09-15T11:00:00Z

I'm merging this one now. We've identified additional spots where deduplicating these very same strings will be helpful so the added complexity will soon be reusable :)

Thanks everyone!

In case of handling a large number of these messages, i.e. when fetching field caps for many indices (and/or those indices contain lots of fields) the type string is repeated many times over. As these strings are already interned because they are constants, taking the performance hit of interning them on deserialization seems a reasonable trade-off for the benefit of saving a non-trivial amount of memory for large clusters as well as speeding up `org.elasticsearch.action.fieldcaps.TransportFieldCapabilitiesAction#merge` which uses these strings in map lookup and will run significantly faster with interned strings instead of fresh strings that do not have their hash values cached yet.

original-brownbear added :Data Management/Indices APIs APIs to create and manage indices and templates v8.0.0 v7.15.0 labels Aug 12, 2021

elasticmachine added the Team:Data Management Meta label for data/management team label Aug 12, 2021

original-brownbear added the >non-issue label Aug 12, 2021

arteam reviewed Aug 12, 2021

View reviewed changes

original-brownbear added 2 commits August 12, 2021 20:42

Merge remote-tracking branch 'elastic/master' into intern-index-field…

22db5f2

…-caps-type-on-deserialization

more sophisticated approach

2f53bb6

original-brownbear requested a review from arteam August 13, 2021 04:44

arteam reviewed Aug 13, 2021

View reviewed changes

original-brownbear added 2 commits August 13, 2021 13:27

Merge remote-tracking branch 'elastic/master' into intern-index-field…

9055ce1

…-caps-type-on-deserialization

shorter singleton

8be6af1

arteam approved these changes Aug 13, 2021

View reviewed changes

original-brownbear requested a review from henningandersen August 16, 2021 12:16

jtibshirani added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Aug 16, 2021

elasticmachine added the Team:Search Meta label for search team label Aug 16, 2021

henningandersen reviewed Aug 17, 2021

View reviewed changes

original-brownbear added 2 commits August 17, 2021 13:54

Merge remote-tracking branch 'elastic/master' into intern-index-field…

33429ae

…-caps-type-on-deserialization

CR: simpler

711c480

original-brownbear requested a review from henningandersen August 17, 2021 14:45

henningandersen approved these changes Aug 17, 2021

View reviewed changes

mark-vieira added v7.16.0 and removed v7.15.0 labels Aug 19, 2021

original-brownbear merged commit 6d20dbc into elastic:master Sep 15, 2021

original-brownbear deleted the intern-index-field-caps-type-on-deserialization branch September 15, 2021 11:00

original-brownbear mentioned this pull request Sep 15, 2021

Fix Large Shard Count Scalability Issues #77466

Open

97 tasks

original-brownbear mentioned this pull request Sep 15, 2021

Intern IndexFieldCapabilities Type String on Read (#76405) #77754

Merged

DaveCTurner mentioned this pull request Sep 19, 2021

node-left ... reason: disconnected triggered by earlier node-left event rather than network issue #67873

Closed

jtibshirani mentioned this pull request Oct 14, 2021

Reduce memory usage and wire size of field caps internal responses #79119

Closed

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

original-brownbear restored the intern-index-field-caps-type-on-deserialization branch April 18, 2023 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intern IndexFieldCapabilities Type String on Read #76405

Intern IndexFieldCapabilities Type String on Read #76405

original-brownbear commented Aug 12, 2021

elasticmachine commented Aug 12, 2021

arteam Aug 12, 2021 •

edited

Loading

arteam Aug 12, 2021

original-brownbear Aug 12, 2021

original-brownbear Aug 13, 2021

arteam left a comment

elasticmachine commented Aug 16, 2021

henningandersen left a comment

henningandersen Aug 17, 2021

original-brownbear Aug 17, 2021

henningandersen Aug 17, 2021

original-brownbear Aug 17, 2021

henningandersen left a comment

jtibshirani commented Aug 17, 2021

original-brownbear commented Aug 20, 2021

original-brownbear commented Sep 15, 2021


		private static final Logger logger = LogManager.getLogger(StringLiteralDeduplicator.class);

		private static final int MAX_SIZE = 1000;

Intern IndexFieldCapabilities Type String on Read #76405

Intern IndexFieldCapabilities Type String on Read #76405

Conversation

original-brownbear commented Aug 12, 2021

elasticmachine commented Aug 12, 2021

arteam Aug 12, 2021 • edited Loading

Choose a reason for hiding this comment

arteam Aug 12, 2021

Choose a reason for hiding this comment

original-brownbear Aug 12, 2021

Choose a reason for hiding this comment

original-brownbear Aug 13, 2021

Choose a reason for hiding this comment

arteam left a comment

Choose a reason for hiding this comment

elasticmachine commented Aug 16, 2021

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Aug 17, 2021

Choose a reason for hiding this comment

original-brownbear Aug 17, 2021

Choose a reason for hiding this comment

henningandersen Aug 17, 2021

Choose a reason for hiding this comment

original-brownbear Aug 17, 2021

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

jtibshirani commented Aug 17, 2021

original-brownbear commented Aug 20, 2021

original-brownbear commented Sep 15, 2021

arteam Aug 12, 2021 •

edited

Loading