Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for merging multiple search responses into one #37566

Merged
merged 14 commits into from
Jan 21, 2019

Conversation

javanna
Copy link
Member

@javanna javanna commented Jan 17, 2019

This will be used in cross-cluster search when reduction will be
performed locally on each cluster. The CCS coordinating node will send
one search request per remote cluster involved and will get one search
response back from each one of them. Such responses contain all the info
to be able to perform an additional reduction and return results back
to the user.

Relates to #32125

This will be used in cross-cluster search when reduction will be
performed locally on each cluster. The CCS coordinating node will send
one search request per remote cluster involved and will get one search
response back from each one of them. Such responses contain all the info
to be able to perform an additional reduction and return results back
to the user.

Relates to elastic#32125
@javanna javanna added >enhancement WIP :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.7.0 labels Jan 17, 2019
@javanna javanna requested a review from jimczi January 17, 2019 11:59
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments regarding the TODOs in the code, LGTM otherwise
Can you add a comment regarding the limitations (the reduce does not handle inner_hits with collapsing, ...) ?

* and pipeline aggregations have not yet been executed. Also, from+size search hits need to be requested to each cluster.
*/
//TODO it may make sense to investigate reusing existing merge code in SearchPhaseController#reducedQueryPhase, the logic is similar
//yet there are substantial differences in terms of the objects exchanged and logic in the sortDocs method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more than just reusing the reducedQueryPhase, as we discussed earlier we could integrate the remote cluster response as a shard response in the initial search phase and ignore hits coming from the remote cluster in the fetch phase. This would be identical to the removed QueryAndFetch strategy except that only the remote cluster response would have the fetch results. This is really a nice to have so no need to follow up on this but it would be nice if the TODO mentions this.

/**
* Add a search response to the list of responses to be merged together into one.
* Merges currently happen at once when all responses are available and {@link #getMergedResponse()} is called. That may change
* in the future as it's possible to introduce incremental merges as responses come in if necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think incremental merges is appealing here. The number of remote clusters should be low so there is no benefit to do the reduce incrementally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may come back to this and have a laugh one day about the "number of remote clusters should be low" assumption :) but I agree we are not talking about hundreds at the moment. I thought it may be useful to incrementally reduce given the size of the responses from each cluster, but we should first measure what the benefit is if any.


TotalHits totalHits = null;
if (totalHitsRelation != null) {
//TODO totalHits may overflow if each cluster reports a very high number?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi do you have thoughts on this one? is it paranoia on my end?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalHits.value is a long so I doubt that it will overflow unless you have millions of remote cluster ;). However this made me think that this pr doesn't handle track_total_hits when it is set as a number. In the normal execution we'll merge all the topdocs and if the resulting total hits is greater than track_total_hits we set the final value as the value in track_total_hits and the final relation to gte. You can check the logic here

.
We'll also need to change the code when #37466 is merged since the default for track_total_hits is going to change.

Copy link
Member Author

@javanna javanna Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I addressed that in a325129 . I think that since I reuse TopDocsStats, this will not need to change once the default trackTotalHitsUpTo changes.

I also went a step further in 8f1b063 and expanded TopDocsStats. Just an experiment but it saves some duplicated logic, let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, the change looks good. Feel free to push when CI is green

@javanna javanna removed the WIP label Jan 18, 2019
@javanna
Copy link
Member Author

javanna commented Jan 18, 2019

retest this please

1 similar comment
@javanna
Copy link
Member Author

javanna commented Jan 21, 2019

retest this please

@javanna javanna removed the v6.7.0 label Jan 21, 2019
@javanna
Copy link
Member Author

javanna commented Jan 21, 2019

I am going to merge this PR despite the failed builds. There are problems in the elasticsearch CI infra that are causing a lot of build failures. I have run this PR multiple times on my own CI and everything was green which gives me confidence that I can merge it.

@javanna javanna merged commit 09a6ba5 into elastic:master Jan 21, 2019
javanna added a commit that referenced this pull request Jan 31, 2019
With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters.

This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour.

Relates to #32125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants