Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return true for can_match on idle search shards #55428

Merged
merged 8 commits into from
Apr 26, 2020

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 18, 2020

With this change, we will always return true for can_match requests on idle search shards; otherwise, some shards will never get refreshed if all search requests perform the can_match phase (i.e., total shards > pre_filter_shard_size).

Relates #27500
Relates #50043

@dnhatn dnhatn added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.8.0 v7.7.1 labels Apr 18, 2020
@dnhatn dnhatn requested review from jpountz and jimczi April 18, 2020 17:57
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@dnhatn
Copy link
Member Author

dnhatn commented Apr 18, 2020

@elasticmachine test this please

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should wait/perform a refresh in the can match phase. I'd prefer that we return true for non-active shards since this phase is not throttled ?

@dnhatn
Copy link
Member Author

dnhatn commented Apr 20, 2020

I'd prefer that we return true for non-active shards since this phase is not throttled ?

++. I will update this PR. Thanks Jim.

@dnhatn dnhatn changed the title Can_match requests should wait for search active Return true for can_match on non-active search shards Apr 20, 2020
@dnhatn dnhatn changed the title Return true for can_match on non-active search shards Return true for can_match on idle search shards Apr 20, 2020
@dnhatn dnhatn requested a review from jimczi April 20, 2020 02:46
@@ -1113,6 +1113,9 @@ public CanMatchResponse canMatch(ShardSearchRequest request) throws IOException
assert request.searchType() == SearchType.QUERY_THEN_FETCH : "unexpected search type: " + request.searchType();
IndexService indexService = indicesService.indexServiceSafe(request.shardId().getIndex());
IndexShard indexShard = indexService.getShard(request.shardId().getId());
if (indexShard.isSearchIdle()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check whether there are changes that have not been refreshed? Otherwise this change will have a terrible effiect on the warm tier by returning can_match=true up to once per second even though no changes are being made to indices? Or are we already protected against this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I pushed 0e7f976.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can still extract the min/max values of the primary sort since we don't need the values to be accurate and use it only to order the shards execution. If indices are in the warm tier they should contain some recent data that we'd like to visit first if the query is sorted by timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds potentially dangerous to me. I wouldn't think of the special case of search-idle indices if someone raised an idea that would leverage min/max values and rely on the fact that the max value to be an upper bound and the min value to be a lower bound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this guarantee today either, the reader of the query phase can be different from the one used in the can match phase. I can update the comments to make it more clear in the code that we don't create a point in time reader during the can match phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the can_match phase might get a different reader from the query phase, but I'm less concerned by this since the reader we get in the can_match phase would also be a legal reader to run the query phase on, which isn't the case here? I don't feel strongly about this, ok to return the min/max values and +1 to document this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I also thought about this because having these guarantees would mean that we can filter shards on the coord node based on these values. We don't do this today because of search-idle shards so we delay the skipping when verifying the values locally during the query phase.
The other option that was proposed by Nhat in this pr was to perform the refresh of search-idle shards during the can_match phase. It has the advantage of making all min/max values accurate for the search request but could make the phase much slower.

I don't feel strongly about this, ok to return the min/max values and +1 to document this.

++, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adrien and Jim for the discussion here. I pushed 80f492f.

@dnhatn dnhatn requested a review from jpountz April 20, 2020 14:30
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn dnhatn requested review from jimczi and jpountz and removed request for jimczi April 21, 2020 13:48
@dnhatn
Copy link
Member Author

dnhatn commented Apr 21, 2020

run elasticsearch-ci/1

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Apr 26, 2020

@jimczi @jpountz Thanks for reviews.

@dnhatn dnhatn merged commit b2a15c6 into elastic:master Apr 26, 2020
@dnhatn dnhatn deleted the can_match_wait_for_refresh branch April 26, 2020 20:43
dnhatn added a commit that referenced this pull request Apr 27, 2020
With this change, we will always return true for can_match requests on
idle search shards; otherwise, some shards will never get refreshed if
all search requests perform the can_match phase (i.e., total shards >
pre_filter_shard_size).

Relates #27500
Relates #50043
@jimczi
Copy link
Contributor

jimczi commented May 26, 2020

@dnhatn I wonder if we should backport this fix to 7.7.1 ? We changed the heuristic for the pre_filter_shard_size in 7.7.0 so this bug is more visible in this version (#57006).

@jimczi
Copy link
Contributor

jimczi commented May 26, 2020

We discussed with @dnhatn and have decided to backport to 7.7.1 in order to provide a fix for #57006
I opened #57158

jimczi added a commit that referenced this pull request May 27, 2020
With this change, we will always return true for can_match requests on
idle search shards; otherwise, some shards will never get refreshed if
all search requests perform the can_match phase (i.e., total shards >
pre_filter_shard_size).

Relates #27500
Relates #50043

Co-authored-by: Nhat Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v7.7.1 v7.8.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants