Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group field-caps node requests by index mapping hash #84598

Closed
wants to merge 20 commits into from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Mar 3, 2022

This optimization is for field-caps requests targeting many indices with an index pattern. Instead of reaching out to many data nodes to retrieve field-caps, we can group indices by their mapping hashes then send a single node request with representative indices instead. This optimization is significant in large clusters.

@dnhatn dnhatn added the v8.2.0 label Mar 3, 2022
@dnhatn dnhatn force-pushed the field_caps_groups branch from cb0b7ec to 62e964f Compare March 17, 2022 02:07
@dnhatn dnhatn force-pushed the field_caps_groups branch from 712f21e to d0295d6 Compare March 17, 2022 02:59
@dnhatn dnhatn added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Mar 18, 2022
@dnhatn dnhatn marked this pull request as ready for review March 18, 2022 14:14
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 18, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn requested review from javanna and jtibshirani March 18, 2022 14:37
@javanna
Copy link
Member

javanna commented Mar 21, 2022

Do I understand correctly that before we would ask for field_caps for all indices, and eventually get back one set of fields per distinct mapping hash; with this change, we send one single request per mapping hash, and then apply the same response to all indices that have the same hash? This has the advantage of minimizing the amount of roundtrips when many indices have the same mappings: many indices all with the same mappings would likely send one request per data node, and get back the full set of fields once from each node. This is much better than what we did before as we would otherwise repeat the set of fields for every index, resulting in much bigger transport response. Though with this change, we would ask only one node, and get back the full set of fields once from it, so the amount of requests is no longer a function of the number of data nodes (again provided that mappings are the same for all indices involved).

This change is a great improvement, but it would be even better to be able to measure the improvement through benchmarks, relates to #84504 .

@original-brownbear would you mind having a look too given your involvement with the many shards scalability project?

@original-brownbear original-brownbear requested review from original-brownbear and removed request for javanna March 23, 2022 09:24
Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great optimization! I left a few small comments but the logic + tests look good to me.

I wonder if Kibana is using index_filter or not, this will really affect when the optimization helps! Unfortunately I think it would require a big change to how we execute field caps in order to get it to work with index_filter...

@original-brownbear
Copy link
Member

Sorry for the delay here Nhat! I'm looking at this today and will try to at least manually benchmark this a little.

@original-brownbear
Copy link
Member

@dnhatn I benchmarked this against the many shards benchmark setup now. Interestingly it does not provide any (measurable) throughput improvement. My best guess as to why that is, is that we are bottle-necked on the REST side of the network layer in some form (this isn't something that can be fixed here so I wouldn't worry about this).
It does however look like this change measurably reduces the GC pressure (about half as much CPU goes into GC after these changes) and while I was not able to measure the effects of this on the amount of bytes+messages that the transport layer has to handle (not because they're not there, just a matter of the complexity of setting up the experiment), reducing them is always a win in terms of stability.

The before and after of innerMerge (which is effectively all the CPU load on the coordinating node during field_caps) also looks nicer now and is able to compile the capturing lambda key -> new FieldCapabilities.Builder(field, key) better which saves us some CPU on the coordinator.

before:

image

after:

image

-> LGTM from my end though I agree with Julie's points on documentation :)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2022

@original-brownbear Thank you for running the benchmark. Did you run against all indices or only auditbeat* indices? I see a small improvement (7%) with auditbeat* indices only.

| 50th percentile latency |              field-caps |   4576.27        |   4254.51        |   -321.753   |     ms |    -7.03% |
| 90th percentile latency |              field-caps |   5482.8         |   4805.08        |   -677.729   |     ms |   -12.36% |
| 99th percentile latency |              field-caps |   6001.41        |   5248.21        |   -753.198   |     ms |   -12.55% |
|100th percentile latency |              field-caps |   6241.83        |   5353.3         |   -888.527   |     ms |   -14.24% |

@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2022

@jtibshirani @original-brownbear Thank you for your reviews. I think I have addressed your comments.

I wonder if Kibana is using index_filter or not, this will really affect when the optimization helps! Unfortunately I think it would require a big change to how we execute field caps in order to get it to work with index_filter...

One of the options is to add the can_match phase for requests with index_filter to eliminate unmatched copies as we discussed. We should have a single execution mode in RequestDispatcher after that. I can do this in a follow-up.

@dnhatn dnhatn requested a review from jtibshirani May 2, 2022 19:22
@dnhatn
Copy link
Member Author

dnhatn commented Jul 6, 2022

The optimization introduced in this PR doesn't reduce memory usage or latency of field-caps requests. I will close this PR and try to get #86323 in instead. Thanks everyone for reviewing.

@dnhatn dnhatn closed this Jul 6, 2022
@dnhatn dnhatn removed the v8.4.0 label Jul 6, 2022
@dnhatn dnhatn deleted the field_caps_groups branch July 6, 2022 18:23
@dnhatn dnhatn removed the request for review from jtibshirani July 6, 2022 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants