-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent multiple sets copies while adding index aliases #115934
Prevent multiple sets copies while adding index aliases #115934
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a comment mostly for my education.
public VType putIfAbsent(KType key, Supplier<VType> value) { | ||
maybeCloneMap(); | ||
VType present = mutableMap.get(key); | ||
if (present == null) { | ||
present = value.get(); | ||
mutableMap.put(key, present); | ||
} | ||
return present; | ||
} | ||
|
||
@SuppressWarnings("unchecked") | ||
public void transformValues(UnaryOperator<VType> transformer) { | ||
maybeCloneMap(); | ||
for (int i = 0; i < mutableMap.values.length; i++) { | ||
if (mutableMap.values[i] != null) { | ||
mutableMap.values[i] = transformer.apply((VType) mutableMap.values[i]); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with ImmutableOpenMap and ObjectObjectHashMap. So I am not sure whether we'd prefer these new methods especially the one that access and assign directly to mutableMap.values. I think it is likely more efficient since it avoids allocate another map. But to play on the safe side, I'd just use a regular map in Metadata#Builder#build and then create an ImmutableOpenMap from it with something like the follows:
// use a HashMap `m` to populate aliases
var mb = ImmutableOpenMap.<String, Set<Index>>builder(m.size());
m.forEach((k, v) -> mb.put(k, Collections.unmodifiableSet(v)));
...
It has one more map allocation. But that feels acceptable and we don't need to touch anything in ImmutableOpenMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for using ImmutableOpenMap
and extending it with new operations was to avoid copying map when building. Example above effectively copies map in .forEach
call in the end. In contrast ImmutableOpenMap builder reuses internal structure:
elasticsearch/server/src/main/java/org/elasticsearch/common/collect/ImmutableOpenMap.java
Lines 350 to 352 in a59c182
ObjectObjectHashMap<KType, VType> mutableMap = this.mutableMap; | |
this.mutableMap = null; // null out the map so that you can't reuse this builder | |
return mutableMap.isEmpty() ? of() : new ImmutableOpenMap<>(mutableMap); |
Overall I do not have a strong preference for either of approaches. Both of them would result in less copying than we have today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah your change is definitely more efficient. It does not copy but still has re-assignment after the transform. So it may not be far off. I was not sure what was the best pratice with hppc maps. I vaguely remembered it being necessary for performance. So let's not use a regular HashMap.
I google'd a bit and it seems the official site actually promotes direct buffer access as the fastest approach. But it also checks an allocated
field since not every element in values
is assigned. But the website is outdated and there is no long such field. Their GitHub examples no longer has such usages but instead uses the iterator which has the following code at its heart:
protected ObjectObjectCursor<KType, VType> fetch() {
if (slot < max) {
KType existing;
for (slot++; slot < max; slot++) {
if (!((existing = (KType) keys[slot]) == null)) {
cursor.index = slot;
cursor.key = existing;
cursor.value = (VType) values[slot];
return cursor;
}
}
}
if (slot == max && hasEmptyKey) {
cursor.index = slot;
cursor.key = null;
cursor.value = (VType) values[max];
slot++;
return cursor;
}
return done();
}
I am not entirely sure whether we should use it since its performance is likely worse than directly manipulating the arrays. But it feels safer. Or alternatively we should add the null
check like how the iterator does it in your version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (pending addressing @ywangd's concerns, which I don't have a strong opinion about)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my end, I think this is close to as fast as it can get and a nice simplification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the iteration!
Prior to this change we were copying aliases map for every index change in the builder at:
elasticsearch/server/src/main/java/org/elasticsearch/cluster/metadata/Metadata.java
Lines 1925 to 1929 in 29c5b49
This is becoming very expensive when applying a MetadataDiff as part of this operation is adding all indices one by one to the builder. In particular this results in N copies for the underlying set when an alias references N indices (this situation is fairly common with data streams).
Closes: #110217