Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the distinction between query and filter context in QueryBuilders #35354

Merged
merged 3 commits into from
Dec 3, 2018

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Nov 7, 2018

When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a bool query is built in a filter context. This behavior doesn't seem right because it makes the matching of should clauses different when the score is not required.

Closes #35293

When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that
only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries
based on whether they need to produce a score or not. However the only case in es that require this distinction
is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.

Closes elastic#35293
@jimczi jimczi added >breaking :Search/Search Search-related issues that do not fall into other categories >breaking-java v7.0.0 labels Nov 7, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@@ -384,30 +384,14 @@ protected Query doToQuery(QueryShardContext context) throws IOException {
return new MatchAllDocsQuery();
}

final String minimumShouldMatch;
if (context.isFilter() && this.minimumShouldMatch == null && shouldClauses.size() > 0) {
minimumShouldMatch = "1";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that confuses me. Why do we need to set minimum_should_match in this case ? I cannot think of one example where this fixes something ? @martijnvg can you confirm ?

@javanna
Copy link
Member

javanna commented Nov 8, 2018

heya @jimczi I was wondering, where is this change breaking?

@jimczi
Copy link
Contributor Author

jimczi commented Nov 8, 2018

I forgot to add the breaking change ;). Today we set minimum_should_match to 1 on bool queries with should clauses if the query is built on the filter context. I don't have the history on why we did that but this makes the following query behaves differently if it is put under a filter:

{
  "query": {
    "bool": {
	"must":   {"match": {"tags": "a"}},
	"should": {"match": {"tags": "c"}},
         "should": {"match": {"tags": "d"}}
    }
  }
}

When put in a filter context the bool query above automatically changes its minimum_should_match to 1. The change in this pr removes this behavior so the should clause would be ignored/removed if they are in a filter context. I find it difficult to reason about should clauses with the current behavior. We have documentation for the current behavior in es but I mistakenly thought that it describes the behavior when there are only should clauses in the bool query.
I have two motivations for this change, the first one is to simplify the understanding of should clauses in general. The second one is that this behavior seems to be the only reason we need to separate filter and query context in es. We had other places where this distinction was required but they are all removed now so I thought that 7 would be a good chance to remove the toQuery, toFilter distinction in QueryBuilder.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if also need to modify or completely remove query_filter_context.asciidoc https://github.com/elastic/elasticsearch/blob/master/docs/reference/query-dsl/query_filter_context.asciidoc

Because as I understand with your PR, we will not have query and filter contexts any more. And also what this document says about caching doesn't seem to be right (as I understood we don't cache based on query and filter context, but based on the query type)

must match a document for it to match the `bool` query. This behavior may be
explicitly controlled by settings the
<<query-dsl-minimum-should-match,`minimum_should_match`>> parameter.
|`should` |The clause (query) should appear in the matching document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add "optionally" here, like "should optionally appear".

Copy link
Contributor Author

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if also need to modify or completely remove query_filter_context.asciidoc https://github.com/elastic/elasticsearch/blob/master/docs/reference/query-dsl/query_filter_context.asciidoc

It's still relevant. The explanation implies that the query is sorted by relevancy so overall score of the query is taken into account. In this case the differenciation between a filter and a should clause is important. Same with the constant_score query.

I think the main question here is whether we want to make this breaking change in 7 or not.
In such case we'd also need to add deprecation warnings in 6 for queries with a mixed of should and required clauses in a filter context. If we feel that it's too late for 7 we could only deprecate now with the goal to remove it in 8.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this distinction was mostly useful for the terms query initially, which was parsed as a BooleanQuery when scoring was enabled and TermsQuery in a filter context. We should remove it now, I agree that the current behavior on bool queries is quite confusing.

@jimczi
Copy link
Contributor Author

jimczi commented Nov 30, 2018

Thanks @jpountz , we discussed this some time ago and I forgot to update the pr. We've agreed that this change should be done in ES 7 and we'll deprecate the behavior in 6x. I'll update the pr with the missing breaking change and prepare the deprecation.

@jimczi jimczi merged commit 74aca75 into elastic:master Dec 3, 2018
@jimczi jimczi deleted the remove_filter_context branch December 3, 2018 10:49
jimczi added a commit that referenced this pull request Dec 3, 2018
This change deprecates the Elasticsearch's filter context and adds a deprecation
warning to bool queries that automatically set their minimum should match to 1.

Relates #35354
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
@matthuhiggins
Copy link

With this change, is this the shortest way to perform an unscored "OR"?

{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            { "term": {"city": "Portland"}},
            { "term": {"state": "Washington"}}
          ],
         "minimum_should_match": 1
        }
      }
    }
  }
}

@telendt
Copy link
Contributor

telendt commented Jun 24, 2019

@matthuhiggins Yes, this is a proper way of performing "unscored disjunction (OR)". It's not the shortest way though as minimum_should_match defaults to 1 if there are only should clauses (this is a default Lucene behavior). It's still ok to specify minimum_should_match explicitly (it's also how I prefer to do it).

@matthuhiggins
Copy link

Got it, thanks. The release notes and deprecation warning leads one to think "minimum_should_match" always defaults to 0, even in the case when there are only "should" clauses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >breaking-java :Search/Search Search-related issues that do not fall into other categories v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Function score functions filter executed in query context
7 participants