-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group field-caps node requests by index mapping hash #84598
Conversation
cb0b7ec
to
62e964f
Compare
712f21e
to
d0295d6
Compare
Pinging @elastic/es-search (Team:Search) |
Hi @dnhatn, I've created a changelog YAML for you. |
Do I understand correctly that before we would ask for field_caps for all indices, and eventually get back one set of fields per distinct mapping hash; with this change, we send one single request per mapping hash, and then apply the same response to all indices that have the same hash? This has the advantage of minimizing the amount of roundtrips when many indices have the same mappings: many indices all with the same mappings would likely send one request per data node, and get back the full set of fields once from each node. This is much better than what we did before as we would otherwise repeat the set of fields for every index, resulting in much bigger transport response. Though with this change, we would ask only one node, and get back the full set of fields once from it, so the amount of requests is no longer a function of the number of data nodes (again provided that mappings are the same for all indices involved). This change is a great improvement, but it would be even better to be able to measure the improvement through benchmarks, relates to #84504 . @original-brownbear would you mind having a look too given your involvement with the many shards scalability project? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great optimization! I left a few small comments but the logic + tests look good to me.
I wonder if Kibana is using index_filter
or not, this will really affect when the optimization helps! Unfortunately I think it would require a big change to how we execute field caps in order to get it to work with index_filter
...
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/RequestDispatcher.java
Show resolved
Hide resolved
Sorry for the delay here Nhat! I'm looking at this today and will try to at least manually benchmark this a little. |
@dnhatn I benchmarked this against the many shards benchmark setup now. Interestingly it does not provide any (measurable) throughput improvement. My best guess as to why that is, is that we are bottle-necked on the REST side of the network layer in some form (this isn't something that can be fixed here so I wouldn't worry about this). The before and after of before: after: -> LGTM from my end though I agree with Julie's points on documentation :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
@original-brownbear Thank you for running the benchmark. Did you run against all indices or only
|
@jtibshirani @original-brownbear Thank you for your reviews. I think I have addressed your comments.
One of the options is to add the |
The optimization introduced in this PR doesn't reduce memory usage or latency of field-caps requests. I will close this PR and try to get #86323 in instead. Thanks everyone for reviewing. |
This optimization is for field-caps requests targeting many indices with an index pattern. Instead of reaching out to many data nodes to retrieve field-caps, we can group indices by their mapping hashes then send a single node request with representative indices instead. This optimization is significant in large clusters.