Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk merge field-caps responses using mapping hash #86323

Merged
merged 16 commits into from
Nov 1, 2022

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented May 1, 2022

The most common usage of field-caps is retrieving the field-caps of group indices having the same index mappings. We can speed up the merging process by performing bulk merges for index responses with the same mapping hash.

This change reduces the response time by 10 times in the many_shards benchmark.

  • GET /auditbeat*/_field_caps?fields=* (single index mapping)
|  50th percentile latency |     field-caps |  4420.91  |  374.729  |   -4046.19  | ms |  -91.52% |
|  90th percentile latency |     field-caps |  5126.87  |  402.883  |   -4723.98  | ms |  -92.14% |
|  99th percentile latency |     field-caps |  5529.41  |  576.324  |   -4953.08  | ms |  -89.58% |
| 100th percentile latency |     field-caps |  6096.73  |  643.252  |   -5453.48  | ms |  -89.45% |
  • GET /*/_field_caps?fields=* * (i.e. multiple index mappings)
|  50th percentile latency | field-caps-all |  4475.04  |  395.844  |   -4079.2   | ms |  -91.15% |
|  90th percentile latency | field-caps-all |  5334.01  |  425.248  |   -4908.76  | ms |  -92.03% |
|  99th percentile latency | field-caps-all |  5628.16  |  606.959  |   -5021.2   | ms |  -89.22% |
| 100th percentile latency | field-caps-all |  6292.63  |  675.807  |   -5616.82  | ms |  -89.26% |

@dnhatn dnhatn added :Search/Search Search-related issues that do not fall into other categories >enhancement labels May 1, 2022
@dnhatn dnhatn marked this pull request as ready for review May 1, 2022 14:37
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 1, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea!

The most common usage of field-caps is retrieving the field-caps of group indices having the same index mappings

I'm not sure about this -- even if all indices use ECS I think there are often differences (some index has one field, another is missing it, etc.)? I'm wondering if it's worth generalizing this approach to still work when there are multiple mappings?

Also how does this optimization interact with #84598, I guess it won't be as helpful if we only send one request per mapping hash?

@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2022

I'm not sure about this -- even if all indices use ECS I think there are often differences (some index has one field, another is missing it, etc.)? I'm wondering if it's worth generalizing this approach to still work when there are multiple mappings?

I didn't look into how to make this optimization for multiple mappings. I think it will take more time to make multiple mappings as efficient as single mapping.

Also how does this optimization interact with #84598, I guess it won't be as helpful if we only send one request per mapping hash?

They are unrelated. #84598 sends a single request, but responds with multiple responses that share the underlying map. The merge process will do the same load (without optimizations).

@jtibshirani Thank you for reviews. I prefer to have this optimization in first, then working on a general optimization. We will remove this optimization (pretty small) entirely if the general optimization yields a similar performance. WDYT?

@dnhatn dnhatn requested a review from jtibshirani May 2, 2022 16:38
@dnhatn dnhatn changed the title Speed up merging field-caps responses from single mapping Speed up merging field-caps responses using mapping hash May 7, 2022
@dnhatn dnhatn changed the title Speed up merging field-caps responses using mapping hash Bulk merging field-caps responses using mapping hash May 7, 2022
@dnhatn dnhatn changed the title Bulk merging field-caps responses using mapping hash Bulk merge field-caps responses using mapping hash May 7, 2022
@dnhatn
Copy link
Member Author

dnhatn commented May 8, 2022

@jtibshirani I've pushed a general optimization for the merging process. The new optimization reduces the response time significantly (9-10 times) for field-caps requests targeting single or multiple index mappings. Can you take another look? Thank you!

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for revising it, it's really nice to have a more general approach. ~10x is a great speedup!

One thing I was wondering: could we simplify the logic by changing the structure of FieldCapabilitiesNodesResponse to directly expose the indices grouped by mapping hash? It feels a bit funny that the coordinator receives grouped responses over the wire, expands these out to have one response per index, then groups them again by mapping hash. If FieldCapabilitiesNodesResponse exposed the mapping hash -> index map, then we could just check if that mapping hash has already been processed, and if so skip it? This would also avoid the need to rely on map object equality as we do in the current strategy, which feels a little tricky/ fragile:

if (indexResponses.get(lastPendingIndex).get() != indexResponses.get(i).get()) { ... }

@dnhatn dnhatn removed the request for review from original-brownbear July 6, 2022 18:24
@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:07
@dnhatn dnhatn force-pushed the faster_merge_field_caps branch 2 times, most recently from bf1773b to a2341b1 Compare October 15, 2022 01:07
@dnhatn dnhatn force-pushed the faster_merge_field_caps branch from a2341b1 to 5b836a3 Compare October 15, 2022 05:43
@dnhatn
Copy link
Member Author

dnhatn commented Oct 15, 2022

@jtibshirani @javanna I've revised this PR. Can you please take another look?

This would also avoid the need to rely on map object equality as we do in the current strategy, which feels a little tricky/ fragile.

I added a comment explaining why we use object equality instead of comparing mapping hashes.

@dnhatn dnhatn requested review from javanna and jtibshirani October 15, 2022 21:20
@javanna javanna requested review from romseygeek and removed request for javanna and jtibshirani October 31, 2022 15:52
@romseygeek
Copy link
Contributor

Is the comparison using getIndexMappingHash much slower than the object equality comparison? I agree with @jtibshirani, this feels a bit fragile as it relies on the deserialization code elsewhere sharing objects, and implementation detail which may change in the future.

@dnhatn
Copy link
Member Author

dnhatn commented Nov 1, 2022

Is the comparison using getIndexMappingHash much slower than the object equality comparison? I agree with @jtibshirani, this feels a bit fragile as it relies on the deserialization code elsewhere sharing objects, and implementation detail which may change in the future.

Thanks for looking into this @romseygeek. I've pushed 51eb999 to remove object equality comparison.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me 👍 I also suspect that comparing the mapping hashes won't be too slow.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @dnhatn!

@Leaf-Lin
Copy link
Contributor

Leaf-Lin commented Nov 1, 2022

I added realease highlight because 10x improvement is really impressive!! @dnhatn, you should brag about it.

@dnhatn
Copy link
Member Author

dnhatn commented Nov 1, 2022

@jtibshirani @romseygeek @Leaf-Lin thank you for reviews.

@dnhatn dnhatn merged commit 5f51386 into elastic:main Nov 1, 2022
@dnhatn dnhatn deleted the faster_merge_field_caps branch November 1, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement release highlight :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants