-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk merge field-caps responses using mapping hash #86323
Conversation
Pinging @elastic/es-search (Team:Search) |
Hi @dnhatn, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice idea!
The most common usage of field-caps is retrieving the field-caps of group indices having the same index mappings
I'm not sure about this -- even if all indices use ECS I think there are often differences (some index has one field, another is missing it, etc.)? I'm wondering if it's worth generalizing this approach to still work when there are multiple mappings?
Also how does this optimization interact with #84598, I guess it won't be as helpful if we only send one request per mapping hash?
I didn't look into how to make this optimization for multiple mappings. I think it will take more time to make multiple mappings as efficient as single mapping.
They are unrelated. #84598 sends a single request, but responds with multiple responses that share the underlying map. The merge process will do the same load (without optimizations). @jtibshirani Thank you for reviews. I prefer to have this optimization in first, then working on a general optimization. We will remove this optimization (pretty small) entirely if the general optimization yields a similar performance. WDYT? |
@jtibshirani I've pushed a general optimization for the merging process. The new optimization reduces the response time significantly (9-10 times) for field-caps requests targeting single or multiple index mappings. Can you take another look? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for revising it, it's really nice to have a more general approach. ~10x is a great speedup!
One thing I was wondering: could we simplify the logic by changing the structure of FieldCapabilitiesNodesResponse
to directly expose the indices grouped by mapping hash? It feels a bit funny that the coordinator receives grouped responses over the wire, expands these out to have one response per index, then groups them again by mapping hash. If FieldCapabilitiesNodesResponse
exposed the mapping hash -> index map, then we could just check if that mapping hash has already been processed, and if so skip it? This would also avoid the need to rely on map object equality as we do in the current strategy, which feels a little tricky/ fragile:
if (indexResponses.get(lastPendingIndex).get() != indexResponses.get(i).get()) { ... }
bf1773b
to
a2341b1
Compare
a2341b1
to
5b836a3
Compare
@jtibshirani @javanna I've revised this PR. Can you please take another look?
I added a comment explaining why we use object equality instead of comparing mapping hashes. |
Is the comparison using |
Thanks for looking into this @romseygeek. I've pushed 51eb999 to remove object equality comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me 👍 I also suspect that comparing the mapping hashes won't be too slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @dnhatn!
I added |
@jtibshirani @romseygeek @Leaf-Lin thank you for reviews. |
The most common usage of field-caps is retrieving the field-caps of group indices having the same index mappings. We can speed up the merging process by performing bulk merges for index responses with the same mapping hash.
This change reduces the response time by 10 times in the many_shards benchmark.
GET /auditbeat*/_field_caps?fields=*
(single index mapping)GET /*/_field_caps?fields=*
* (i.e. multiple index mappings)