Prevent multiple sets copies while adding index aliases #115934

idegtiarenko · 2024-10-30T13:04:49Z

Prior to this change we were copying aliases map for every index change in the builder at:

elasticsearch/server/src/main/java/org/elasticsearch/cluster/metadata/Metadata.java

Lines 1925 to 1929 in 29c5b49

    
           Set<Index> indices = new HashSet<>(aliasedIndices.getOrDefault(alias, Set.of())); 
        
           if (indices.add(index) == false) { 
        
               return this; // indices already contained this index 
        
           } 
        
           aliasedIndices.put(alias, Collections.unmodifiableSet(indices));

This is becoming very expensive when applying a MetadataDiff as part of this operation is adding all indices one by one to the builder. In particular this results in N copies for the underlying set when an alias references N indices (this situation is fairly common with data streams).

Closes: #110217

elasticsearchmachine · 2024-10-30T13:05:18Z

Pinging @elastic/es-distributed (Team:Distributed)

ywangd

I have a comment mostly for my education.

ywangd · 2024-10-31T05:27:49Z

server/src/main/java/org/elasticsearch/common/collect/ImmutableOpenMap.java

+        public VType putIfAbsent(KType key, Supplier<VType> value) {
+            maybeCloneMap();
+            VType present = mutableMap.get(key);
+            if (present == null) {
+                present = value.get();
+                mutableMap.put(key, present);
+            }
+            return present;
+        }
+
+        @SuppressWarnings("unchecked")
+        public void transformValues(UnaryOperator<VType> transformer) {
+            maybeCloneMap();
+            for (int i = 0; i < mutableMap.values.length; i++) {
+                if (mutableMap.values[i] != null) {
+                    mutableMap.values[i] = transformer.apply((VType) mutableMap.values[i]);
+                }
+            }
+        }


I am not familiar with ImmutableOpenMap and ObjectObjectHashMap. So I am not sure whether we'd prefer these new methods especially the one that access and assign directly to mutableMap.values. I think it is likely more efficient since it avoids allocate another map. But to play on the safe side, I'd just use a regular map in Metadata#Builder#build and then create an ImmutableOpenMap from it with something like the follows:

// use a HashMap `m` to populate aliases var mb = ImmutableOpenMap.<String, Set<Index>>builder(m.size()); m.forEach((k, v) -> mb.put(k, Collections.unmodifiableSet(v))); ...

It has one more map allocation. But that feels acceptable and we don't need to touch anything in ImmutableOpenMap.

The main reason for using ImmutableOpenMap and extending it with new operations was to avoid copying map when building. Example above effectively copies map in .forEach call in the end. In contrast ImmutableOpenMap builder reuses internal structure:

elasticsearch/server/src/main/java/org/elasticsearch/common/collect/ImmutableOpenMap.java

Lines 350 to 352 in a59c182

ObjectObjectHashMap<KType, VType> mutableMap = this.mutableMap;

this.mutableMap = null; // null out the map so that you can't reuse this builder

return mutableMap.isEmpty() ? of() : new ImmutableOpenMap<>(mutableMap);

Overall I do not have a strong preference for either of approaches. Both of them would result in less copying than we have today.

Yeah your change is definitely more efficient. It does not copy but still has re-assignment after the transform. So it may not be far off. I was not sure what was the best pratice with hppc maps. I vaguely remembered it being necessary for performance. So let's not use a regular HashMap.

I google'd a bit and it seems the official site actually promotes direct buffer access as the fastest approach. But it also checks an allocated field since not every element in values is assigned. But the website is outdated and there is no long such field. Their GitHub examples no longer has such usages but instead uses the iterator which has the following code at its heart:

protected ObjectObjectCursor<KType, VType> fetch() { if (slot < max) { KType existing; for (slot++; slot < max; slot++) { if (!((existing = (KType) keys[slot]) == null)) { cursor.index = slot; cursor.key = existing; cursor.value = (VType) values[slot]; return cursor; } } } if (slot == max && hasEmptyKey) { cursor.index = slot; cursor.key = null; cursor.value = (VType) values[max]; slot++; return cursor; } return done(); }

I am not entirely sure whether we should use it since its performance is likely worse than directly manipulating the arrays. But it feels safer. Or alternatively we should add the null check like how the iterator does it in your version.

nicktindall

LGTM (pending addressing @ywangd's concerns, which I don't have a strong opinion about)

original-brownbear

LGTM from my end, I think this is close to as fast as it can get and a nice simplification!

ywangd

LGTM
Thanks for the iteration!

elasticsearchmachine · 2024-11-01T08:49:51Z

💚 Backport successful

Status	Branch	Result
✅	8.16
✅	8.15
✅	8.x

…16067)

…16066)

…16068)

Prevent multiple sets copies while adding index aliases

0d76334

idegtiarenko requested review from ywangd and original-brownbear October 30, 2024 13:04

elasticsearchmachine added the v9.0.0 label Oct 30, 2024

idegtiarenko mentioned this pull request Oct 30, 2024

Avoid copying aliases if they already contain target index #115855

Closed

ywangd reviewed Oct 31, 2024

View reviewed changes

nicktindall approved these changes Oct 31, 2024

View reviewed changes

idegtiarenko added the v8.17.0 label Oct 31, 2024

idegtiarenko added 3 commits October 31, 2024 11:35

do not add new operations to immutable open map builder

511cff9

Merge branch 'main' into prevent_unnecessary_alias_copies

f3a4aea

Merge branch 'main' into prevent_unnecessary_alias_copies

48a1043

original-brownbear approved these changes Oct 31, 2024

View reviewed changes

ywangd approved these changes Nov 1, 2024

View reviewed changes

idegtiarenko added v8.16.1 auto-backport Automatically create backport pull requests when merged v8.16.0 v8.15.4 and removed v8.16.1 labels Nov 1, 2024

Merge branch 'main' into prevent_unnecessary_alias_copies

18d6cf4

idegtiarenko merged commit 889f015 into elastic:main Nov 1, 2024
16 checks passed

This was referenced Nov 1, 2024

[8.16] Prevent multiple sets copies while adding index aliases (#115934) #116066

Merged

[8.15] Prevent multiple sets copies while adding index aliases (#115934) #116067

Merged

[8.x] Prevent multiple sets copies while adding index aliases (#115934) #116068

Merged

idegtiarenko added a commit to idegtiarenko/elasticsearch that referenced this pull request Nov 1, 2024

Prevent multiple sets copies while adding index aliases (elastic#115934)

26f449f

idegtiarenko added a commit to idegtiarenko/elasticsearch that referenced this pull request Nov 1, 2024

Prevent multiple sets copies while adding index aliases (elastic#115934)

d98d8b1

idegtiarenko added a commit to idegtiarenko/elasticsearch that referenced this pull request Nov 1, 2024

Prevent multiple sets copies while adding index aliases (elastic#115934)

89c96a1

idegtiarenko deleted the prevent_unnecessary_alias_copies branch November 1, 2024 08:51

elasticsearchmachine pushed a commit that referenced this pull request Nov 1, 2024

Prevent multiple sets copies while adding index aliases (#115934) (#1…

f022a53

…16067)

elasticsearchmachine pushed a commit that referenced this pull request Nov 1, 2024

Prevent multiple sets copies while adding index aliases (#115934) (#1…

b99189d

…16066)

elasticsearchmachine pushed a commit that referenced this pull request Nov 1, 2024

Prevent multiple sets copies while adding index aliases (#115934) (#1…

504b29a

…16068)

jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024

Prevent multiple sets copies while adding index aliases (elastic#115934)

52811ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent multiple sets copies while adding index aliases #115934

Prevent multiple sets copies while adding index aliases #115934

idegtiarenko commented Oct 30, 2024

elasticsearchmachine commented Oct 30, 2024

ywangd left a comment

ywangd Oct 31, 2024

idegtiarenko Oct 31, 2024

ywangd Oct 31, 2024

nicktindall left a comment

original-brownbear left a comment

ywangd left a comment

elasticsearchmachine commented Nov 1, 2024

	Set<Index> indices = new HashSet<>(aliasedIndices.getOrDefault(alias, Set.of()));
	if (indices.add(index) == false) {
	return this; // indices already contained this index
	}
	aliasedIndices.put(alias, Collections.unmodifiableSet(indices));

	ObjectObjectHashMap<KType, VType> mutableMap = this.mutableMap;
	this.mutableMap = null; // null out the map so that you can't reuse this builder
	return mutableMap.isEmpty() ? of() : new ImmutableOpenMap<>(mutableMap);

Prevent multiple sets copies while adding index aliases #115934

Prevent multiple sets copies while adding index aliases #115934

Conversation

idegtiarenko commented Oct 30, 2024

elasticsearchmachine commented Oct 30, 2024

ywangd left a comment

Choose a reason for hiding this comment

ywangd Oct 31, 2024

Choose a reason for hiding this comment

idegtiarenko Oct 31, 2024

Choose a reason for hiding this comment

ywangd Oct 31, 2024

Choose a reason for hiding this comment

nicktindall left a comment

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

ywangd left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 1, 2024

💚 Backport successful