-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize how queries on _index
are handled at rewrite time
#52486
Generalize how queries on _index
are handled at rewrite time
#52486
Conversation
This field is a specialization of the `keyword` field for the case when all documents have the same value. It typically performs more efficiently than keywords at query time by figuring out whether all or none of the documents match at rewrite time, like `term` queries on `_index`. The name is up for discussion. I liked including `keyword` in it, so that we still have room for a `singleton_numeric` in the future. However I'm unsure whether to call it `singleton`, `constant` or something else, any opinions? For this field there is a choice between 1. accepting values in `_source` when they are equal to the value configured in mappings, but rejecting mapping updates 2. rejecting values in `_source` but then allowing updates to the value that is configured in the mapping This commit implements option 1, so that it is possible to reindex from/to an index that has the field mapped as a keyword with no changes to the source.
Pinging @elastic/es-search (:Search/Search) |
_index
are handled at rewrite time_index
are handled at rewrite time
@elasticmachine run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jpountz, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also looks good to me, I had one suggestion I wanted to raise but don't feel strongly about. Currently the contract for classes implementing ConstantFieldType
is that they should produce either MatchNoDocsQuery
or MatchAllDocsQuery
for certain query factory methods. They should also make sure to break out of termsQuery
early if they find a match. To me this strategy felt more a bit more robust:
- Add abstract methods like
bool matchesExact(Object value, QueryShardContext context)
toConstantFieldType
. - Define final versions of
termQuery
,termsQuery
, etc. inConstantFieldType
that call into these methods.
return new MatchNoneQueryBuilder(); | ||
} | ||
QueryShardContext context = queryRewriteContext.convertToShardContext(); | ||
if (context != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, could just be if (context != null && context.fieldMapper(...)) { ... }
return queryBuilder.toQuery(context); | ||
} | ||
|
||
static Query parseQuery(QueryShardContext context, boolean mapUnmappedFieldsAsString, XContentParser parser) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unused, I think it can just be deleted?
@@ -103,7 +104,7 @@ public final Query toQuery(QueryShardContext context) throws IOException { | |||
if (boost != DEFAULT_BOOST) { | |||
if (query instanceof SpanQuery) { | |||
query = new SpanBoostQuery((SpanQuery) query, boost); | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my knowledge, I'm wondering why this change (and the ones to AbstractQueryTestCase
) were necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change as a way to keep tests simple. For instance here is what ConstantScoreQueryBuilderTests#doAssertLuceneQuery
looks like in master today.
@Override
protected void doAssertLuceneQuery(ConstantScoreQueryBuilder queryBuilder, Query query, QueryShardContext context) throws IOException {
Query innerQuery = queryBuilder.innerQuery().toQuery(context);
if (innerQuery == null) {
assertThat(query, nullValue());
} else {
assertThat(query, instanceOf(ConstantScoreQuery.class));
ConstantScoreQuery constantScoreQuery = (ConstantScoreQuery) query;
assertThat(constantScoreQuery.getQuery(), instanceOf(innerQuery.getClass()));
}
}
This test only worked because most queries on unmapped fields would create the same query as on a keyword
field. But with this change, queries on unmapped fields get rewritten as a MatchNoneQueryBuilder
. And when its inner query rewrites to a MatchNoneQueryBuilder
, ConstantScoreQueryBuilder
itself rewrites to a MatchNoneQueryBuilder
. So I updated the logic this way:
@Override
protected void doAssertLuceneQuery(ConstantScoreQueryBuilder queryBuilder, Query query, QueryShardContext context) throws IOException {
Query innerQuery = queryBuilder.innerQuery().rewrite(context).toQuery(context);
if (innerQuery == null) {
assertThat(query, nullValue());
} else if (innerQuery instanceof MatchNoDocsQuery) {
assertThat(query, instanceOf(MatchNoDocsQuery.class));
} else {
assertThat(query, instanceOf(ConstantScoreQuery.class));
ConstantScoreQuery constantScoreQuery = (ConstantScoreQuery) query;
assertThat(constantScoreQuery.getQuery(), instanceOf(innerQuery.getClass()));
}
}
But this fails if the inner query is wrapped in a BoostQuery
. So I had to either change AbstractQueryBuilder
to no longer wrap MatchNoDocsQuery
with a boost, or change this test (and a couple other ones IIRC) to check whether the inner query might be wrapped inside a BoostQuery
. I chose the former, which sounded simpler to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clear explanation!
@jtibshirani Thanks for your comments, I wondered whether I should do something like that and leaned towards not doing it to avoid introducing new abstractions, but I also agree that it reduces chances to not implement the contract. I'll implement your suggestion. |
…ic#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress elastic#49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes elastic#49254
Generalize how queries on `_index` are handled at rewrite time (#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes #49254
#49713 aims at introducing a new
constant_keyword
field which, like_index
, always rewrites queries to aMatchAllQueryBuilder
or aMatchNoneQueryBuilder
in order to skip shards in thecan_match
phase. This change introduces a newConstantFieldType
marker class that helps get this functionality with any field and not just_index
.Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a
keyword
field when a field is unmapped, queries get rewritten to aMatchNoDocsQueryBuilder
.This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the
significant_terms
aggregation doesn't rewrite its inner filter, which I fixed.Closes #49254