Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ability to minimize round-trips in CCS #37828

Merged
merged 33 commits into from
Jan 31, 2019

Conversation

javanna
Copy link
Member

@javanna javanna commented Jan 24, 2019

With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters.

This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour.

Relates to #32125

@javanna javanna added >feature release highlight WIP :Search/Search Search-related issues that do not fall into other categories v7.0.0 labels Jan 24, 2019
@javanna javanna requested a review from jimczi January 24, 2019 15:59
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left a minor comment regarding another pr that will conflict with this one.

@javanna javanna changed the title Introduce ccs remote reduce mode Introduce CCS remote reduce mode Jan 24, 2019
@javanna javanna requested a review from s1monw January 28, 2019 07:20
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question

involved, as it treats all shards the same, at the cost of sending many
requests to each remote cluster.

- `auto`: `remote` is used whenever possible. In case a scroll is provided,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what happens if I specify remote and I have inner hits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have to have a mode at all. can't we simply go with local_reduce: true|false if it's true we force local if not we try remote if possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what happens if I specify remote and I have inner hits?

If you specify remote and you provide a scroll or request inner hits as part of fields collapsing, you get back an error. I think it's better to let users know that what they have requested is not supported when they are explicit in their request.

I wonder if we have to have a mode at all. can't we simply go with local_reduce: true|false if it's true we force local if not we try remote if possible?

I think that having remote (or false) act only as a preference is confusing, as users are requesting something that may not be possible. Ideally, if somebody sets local_reduce: false (or reduce_mode: remote), we would never go and fan out to the shards. It sounds weird to me to have an option act as a proper on/off behaviour and the other one as a preference, unless they have different names.

If we want to have only two values and get rid of the auto option, we could have reduce_mode: always_local | prefer_remote which I find much more self-explanatory than a boolean. Yet I am not sure it is much different from what we have now in the PR.

Copy link
Member Author

@javanna javanna Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we could also rename auto to prefer_remote and keep remote.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed this a bit with @jimczi and the team, we could go with a boolean but I would like to have some force part in the name of the option. We could go with ccs_force_local_reduce which defaults to false, or maybe even ccs_optimize_for_latency which defaults to true? I think I prefer the latter as it tells when such option should be used. Any other thoughts/preference?

@clintongormley
Copy link
Contributor

We could go with ccs_force_local_reduce which defaults to false, or maybe even ccs_optimize_for_latency which defaults to true? I think I prefer the latter as it tells when such option should be used.

With ccs_optimize_for_latency it is not clear to me which option I'd be opting for - i'd have to really understand the feature. Also, given that we want to use remote reduce as the default because we think it'll be better, then i think ccs_force_local_reduce: true works as a good way to opt out of that behaviour.

@s1monw
Copy link
Contributor

s1monw commented Jan 28, 2019

With ccs_optimize_for_latency it is not clear to me which option I'd be opting for - i'd have to really understand the feature. Also, given that we want to use remote reduce as the default because we think it'll be better, then i think ccs_force_local_reduce: true works as a good way to opt out of that behaviour.

I am not sure if ccs_force_local_reduce explains what's going on. the reduce is not relevant here IMO. This is more about the network calls we do. Something like css_minimize_roundtrips: true|false would be more descriptive IMO.

@zuketo
Copy link

zuketo commented Jan 28, 2019

Does the term WAN help here? E.g. ccs_wan_optimized: true|false

"We're operating across a large network, let's keep this set to true" vs "We're operating within a local network, let's investigate changing this setting"

@javanna
Copy link
Member Author

javanna commented Jan 29, 2019

I am not sure if ccs_force_local_reduce explains what's going on. the reduce is not relevant here IMO. This is more about the network calls we do. Something like css_minimize_roundtrips: true|false would be more descriptive IMO.

I agree with this, I like ccs_minimize_roundtrips, I think it explains the trade-off well, at the same time staying high-level and without mentioning reduction. I briefly chatted to @clintongormley and he also agrees. I will adapt the PR accordingly.

@s1monw
Copy link
Contributor

s1monw commented Jan 29, 2019

thanks @javanna

@javanna
Copy link
Member Author

javanna commented Jan 31, 2019

heya @s1monw can you have another look? it should be ready.

@javanna
Copy link
Member Author

javanna commented Jan 31, 2019

run elasticsearch-ci/2

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @javanna

@javanna javanna changed the title Introduce CCS remote reduce mode Introduce ability to minimize round-trips in CCS requests Jan 31, 2019
@javanna javanna merged commit 622fb78 into elastic:master Jan 31, 2019
@javanna
Copy link
Member Author

javanna commented Jan 31, 2019

thanks a lot everybody involved! ;)

@javanna javanna changed the title Introduce ability to minimize round-trips in CCS requests Introduce ability to minimize round-trips in CCS Jan 31, 2019
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 31, 2019
* master: (100 commits)
  Push primary term to replication tracker (elastic#38044)
  Introduce ability to minimize round-trips in CCS (elastic#37828)
  Don't Assert Ack on when Publish Timeout is 0 in Test (elastic#38077)
  Reduce object creation in Rounding class (elastic#38061)
  Treat put-mapping calls with `_doc` as a top-level key as typed calls. (elastic#38032)
  Fix test bug when testing the merging of mappings and templates. (elastic#38021)
  spelling: java script -- not JavaScript (elastic#37057)
  Enable SSL in reindex with security QA tests (elastic#37600)
  Disable BWC tests during backport (elastic#38074)
  SQL: Added SSL configuration options tests (elastic#37875)
  Minor fixes in the release notes script. (elastic#37967)
  Fix typo in docs. (elastic#38018)
  Update Lucene repo for 7.0.0-alpha2 (elastic#37985)
  Fix size of rolling-upgrade bootstrap config (elastic#38031)
  fix DateIndexNameProcessorTests offset pattern (elastic#38069)
  Speed up converting of temporal accessor to zoned date time (elastic#37915)
  Work around JDK8 timezone bug in tests (elastic#37968)
  Correct arg names when update mapping/settings from leader (elastic#38063)
  Introduce ssl settings to reindex from remote (elastic#37527)
  Mute testRetentionLeasesSyncOnExpiration
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants